Skip to content

Commit 69dd981

Browse files
authored
Refactor feed processing into testable utilities with comprehensive tests
This PR refactors feed processing code into small, testable utility functions with comprehensive test coverage using YAML test data and GitHub Actions CI. Key improvements: - Extracted feed processing logic into pure utility functions - Added comprehensive test coverage with YAML test data - Set up GitHub Actions CI to run tests automatically - Split tests into separate files for article and feed utilities - Added Redis key prefix constants for maintainability - Improved code documentation with explanatory comments All tests passing (26/26) ✓
1 parent e735a84 commit 69dd981

File tree

12 files changed

+658
-30
lines changed

12 files changed

+658
-30
lines changed

.github/workflows/test.yml

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
name: Run Tests
2+
3+
on:
4+
push:
5+
branches: [ master ]
6+
pull_request:
7+
branches: [ master ]
8+
9+
jobs:
10+
test:
11+
runs-on: ubuntu-latest
12+
13+
strategy:
14+
matrix:
15+
node-version: [18.x, 20.x]
16+
17+
steps:
18+
- uses: actions/checkout@v3
19+
20+
- name: Use Node.js ${{ matrix.node-version }}
21+
uses: actions/setup-node@v3
22+
with:
23+
node-version: ${{ matrix.node-version }}
24+
25+
- name: Install dependencies
26+
run: npm install
27+
28+
- name: Run article utility tests
29+
run: node src/lib/articleUtils.test.js
30+
31+
- name: Run feed utility tests
32+
run: node src/lib/feedUtils.test.js

.gitignore

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,4 +7,4 @@ aws-config.json
77
.env
88
dump.rdb
99
npm-debug.log
10-
lib
10+
/lib

TESTING.md

Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
# Testing Guide for Feed Processing Functions
2+
3+
This document describes the refactored, testable feed processing utilities and how to run tests.
4+
5+
## Overview
6+
7+
The feed processing logic has been extracted into small, testable functions in `src/lib/`:
8+
9+
- **`articleUtils.js`** - Pure functions for article hashing and scoring (no external dependencies)
10+
- **`feedUtils.js`** - Utility functions for feed processing (headers, Redis keys, validation, etc.)
11+
12+
## Running Tests
13+
14+
```bash
15+
node src/lib/feedUtils.test.js
16+
```
17+
18+
All tests use the test data in `testdata/test-cases.json` which contains expected inputs and outputs generated from the actual Node.js implementation.
19+
20+
## Test Coverage
21+
22+
### Article Functions (`articleUtils.js`)
23+
24+
1. **`hash(article)`** - MD5 hash of article GUID
25+
- Tests: 3 test cases verifying hash consistency
26+
- Implementation matches `src/articles.js` exactly
27+
28+
2. **`score(article)`** - Unix timestamp score
29+
- Tests: 3 test cases with different date field names (pubDate, pubdate, date)
30+
- Implementation matches `src/articles.js` exactly
31+
32+
### Feed Functions (`feedUtils.js`)
33+
34+
1. **`buildRequestHeaders(storedFeed)`** - Builds HTTP headers for conditional GET
35+
- Tests: 4 test cases (no headers, If-Modified-Since, If-None-Match, both)
36+
37+
2. **`buildRedisKeys(feedURI)`** - Creates Redis key names
38+
- Tests: 2 test cases with different feed URLs
39+
40+
3. **`buildArticleKey(hash)`** - Creates article key for Redis sorted set
41+
- Tests: 1 test case verifying format
42+
43+
4. **`processArticle(article, feedURI, hashFn, scoreFn)`** - Adds computed fields
44+
- Tests: 1 test case verifying hash, score, and feedurl are added
45+
46+
5. **`shouldStoreArticle(oldScore, newScore)`** - Determines if article needs S3 storage
47+
- Tests: 4 test cases (new article, changed score, unchanged score, type coercion)
48+
49+
6. **`isValidArticle(article)`** - Validates article has required fields
50+
- Tests: 4 test cases (valid, missing guid, missing description, null)
51+
52+
7. **`extractFeedMetadata(meta)`** - Extracts title and link from parser meta
53+
- Tests: 1 test case
54+
55+
8. **`extractArticleIds(articleKeys)`** - Strips "article:" prefix from Redis keys
56+
- Tests: 1 test case
57+
58+
## Test Data Format
59+
60+
The `testdata/test-cases.json` file contains test cases organized by function:
61+
62+
```json
63+
{
64+
"hash_function_tests": [...],
65+
"score_function_tests": [...],
66+
"request_headers_tests": [...],
67+
...
68+
}
69+
```
70+
71+
Each test case has:
72+
- `description` - Human-readable test description
73+
- `input` - Input value(s) for the function
74+
- `expected` - Expected output value
75+
76+
## Adding New Tests
77+
78+
1. Add test data to `testdata/test-cases.json`
79+
2. Add corresponding test code in `src/lib/feedUtils.test.js`
80+
3. Run tests to verify
81+
82+
## Future Work
83+
84+
Next steps:
85+
1. Refactor `src/feeds.js` to use these utility functions
86+
2. Add integration tests for Redis and S3 operations
87+
3. Create Go implementation with matching behavior (in `feedfetcher/` directory)
88+
4. Create Go tests that use the same `testdata/test-cases.json` file
89+
90+
## Why These Functions?
91+
92+
These functions were extracted because they are:
93+
1. **Pure or nearly pure** - Deterministic output for given input
94+
2. **Core business logic** - Critical for feed processing correctness
95+
3. **Reusable** - Can be used by both Node.js and Go implementations
96+
4. **Independently testable** - No mocking of Redis/S3 needed
97+
98+
The goal is to ensure both Node.js and Go implementations produce identical results for:
99+
- Article hashing (critical for deduplication)
100+
- Article scoring (critical for sorting)
101+
- Request headers (critical for conditional GET optimization)
102+
- Redis key naming (critical for data storage)
103+
- S3 storage decisions (critical for performance)

package.json

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,7 @@
4747
"eslint-config-airbnb": "^11.1.0",
4848
"eslint-plugin-import": "^1.15.0",
4949
"eslint-plugin-jsx-a11y": "^2.2.2",
50-
"eslint-plugin-react": "^6.2.0"
50+
"eslint-plugin-react": "^6.2.0",
51+
"js-yaml": "^4.1.0"
5152
}
5253
}

src/articles.js

Lines changed: 5 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,11 @@
1-
import crypto from 'crypto';
21
import AWS from 'aws-sdk';
32
import labels from './labels';
3+
// Import hash and score functions from testable utilities
4+
import { hash as hashArticle, score as scoreArticle } from './lib/articleUtils.js';
45

5-
export function hash(article) {
6-
return crypto.createHash('md5').update(article.guid).digest('hex');
7-
}
8-
9-
export function score(article) {
10-
const articleDate = article.pubDate || article.pubdate || article.date;
11-
const articleScore = Date.parse(articleDate) || Date.now();
12-
return articleScore;
13-
}
6+
// Re-export for backward compatibility
7+
export const hash = hashArticle;
8+
export const score = scoreArticle;
149

1510
function post(req, res) {
1611
res.json({

src/feeds.js

Lines changed: 21 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,16 @@ import FeedParser from 'feedparser';
33
import request from 'request';
44
import AWS from 'aws-sdk';
55
import { hash, score } from './articles';
6+
import {
7+
buildRequestHeaders,
8+
buildRedisKeys,
9+
buildArticleKey,
10+
processArticle,
11+
shouldStoreArticle,
12+
isValidArticle,
13+
extractArticleIds,
14+
generateArticleBody,
15+
} from './lib/feedUtils.js';
616

717
const redisURL = process.env.REDIS_URL;
818
const redisClient = redis.createClient(redisURL);
@@ -83,7 +93,7 @@ function get(req, res) {
8393
const feed = storedFeed;
8494
feed.key = feedurl;
8595
feeds.push(feed);
86-
const articleIDs = articles.map(key => key.substr(8));
96+
const articleIDs = extractArticleIds(articles);
8797
if (feedurlPosition === feedurls.length - 1) {
8898
res.json({
8999
success: true,
@@ -111,17 +121,12 @@ const feed = {
111121
const params = { Bucket: 'feedreader2018-articles' };
112122
const s3 = new AWS.S3({ params });
113123
const feedURI = decodeURIComponent(req.url.slice(10));
114-
const feedKey = `feed:${feedURI}`;
115-
const articlesKey = `articles:${feedURI}`;
124+
const { feedKey, articlesKey } = buildRedisKeys(feedURI);
116125

117126
redisClient.hgetall(feedKey, (e, storedFeed) => {
118127
let fetchedFeed = {};
119128
if ((!e) && storedFeed) fetchedFeed = storedFeed;
120-
const headers = {
121-
'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.63 Safari/537.36',
122-
};
123-
if (fetchedFeed.lastModified) headers['If-Modified-Since'] = fetchedFeed.lastModified;
124-
if (fetchedFeed.etag) headers['If-None-Match'] = fetchedFeed.etag;
129+
const headers = buildRequestHeaders(fetchedFeed);
125130

126131
const requ = request({
127132
uri: feedURI,
@@ -187,16 +192,14 @@ const feed = {
187192
const stream = this;
188193
for (;;) {
189194
const article = stream.read();
190-
if (!article || !article.guid || !article.description) {
195+
if (!isValidArticle(article)) {
191196
return;
192197
}
193-
article.hash = hash(article);
194-
article.score = score(article);
195-
article.feedurl = feedURI;
196198

197-
const key = article.hash;
198-
const rank = article.score;
199-
const articleKey = `article:${key}`;
199+
const processedArticle = processArticle(article, feedURI, hash, score);
200+
const key = processedArticle.hash;
201+
const rank = processedArticle.score;
202+
const articleKey = buildArticleKey(key);
200203

201204
redisClient.zscore(articlesKey, articleKey, (zscoreErr, oldscore) => {
202205
if (zscoreErr) {
@@ -211,9 +214,9 @@ const feed = {
211214
articleAddErr.type = 'Redis Error';
212215
articleAddErr.log = zaddErr.message;
213216
stream.emit('error', articleAddErr);
214-
} else if ((oldscore === null) || (rank !== parseInt(oldscore))) {
217+
} else if (shouldStoreArticle(oldscore, rank)) {
215218
// Only stringify when we actually need to store it
216-
const body = JSON.stringify(article);
219+
const body = generateArticleBody(processedArticle);
217220
s3.putObject({
218221
Key: key,
219222
Body: body,
@@ -245,7 +248,7 @@ const feed = {
245248
});
246249
} else {
247250
fetchedFeed.success = true;
248-
fetchedFeed.articles = allArticles.map(key => key.substr(8));
251+
fetchedFeed.articles = extractArticleIds(allArticles);
249252
res.json(fetchedFeed);
250253
}
251254
});

src/lib/articleUtils.js

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
// Pure utility functions for article processing (no external dependencies)
2+
// These can be tested without AWS or Redis
3+
4+
const crypto = require('crypto');
5+
6+
/**
7+
* Generates MD5 hash of article GUID
8+
* Reference: api/src/articles.js hash() function
9+
* @param {Object} article - Article object with guid field
10+
* @returns {string} MD5 hash in hex format
11+
*/
12+
function hash(article) {
13+
return crypto.createHash('md5').update(article.guid).digest('hex');
14+
}
15+
16+
/**
17+
* Generates score (timestamp) for article
18+
* Reference: api/src/articles.js score() function
19+
* @param {Object} article - Article object with date fields
20+
* @returns {number} Unix timestamp in milliseconds
21+
*/
22+
function score(article) {
23+
const articleDate = article.pubDate || article.pubdate || article.date;
24+
const articleScore = Date.parse(articleDate) || Date.now();
25+
return articleScore;
26+
}
27+
28+
module.exports = { hash, score };

src/lib/articleUtils.test.js

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
// Tests for article utility functions (hash and score)
2+
// Run with: node src/lib/articleUtils.test.js
3+
4+
const { hash, score } = require('./articleUtils.js');
5+
const fs = require('fs');
6+
const yaml = require('js-yaml');
7+
const assert = require('assert');
8+
9+
// Load test cases from YAML
10+
const testCasesYaml = fs.readFileSync('./testdata/test-cases.yaml', 'utf8');
11+
const testCases = yaml.load(testCasesYaml);
12+
13+
// Simple test runner
14+
let passed = 0;
15+
let failed = 0;
16+
17+
function test(name, fn) {
18+
try {
19+
fn();
20+
passed++;
21+
console.log(`✓ ${name}`);
22+
} catch (error) {
23+
failed++;
24+
console.error(`✗ ${name}`);
25+
console.error(` ${error.message}`);
26+
}
27+
}
28+
29+
// Run all tests
30+
console.log('\n=== Testing Article Utility Functions ===\n');
31+
32+
// Test hash function
33+
testCases.hash_function_tests.forEach((testCase) => {
34+
test(testCase.description, () => {
35+
const result = hash(testCase.input);
36+
assert.strictEqual(result, testCase.expected,
37+
`Hash mismatch: got ${result}, expected ${testCase.expected}`);
38+
});
39+
});
40+
41+
// Test score function
42+
testCases.score_function_tests.forEach((testCase) => {
43+
test(testCase.description, () => {
44+
const result = score(testCase.input);
45+
if (testCase.expected_type === 'timestamp') {
46+
// For invalid dates that fallback to Date.now(), just check it's a number
47+
assert.strictEqual(typeof result, 'number',
48+
`Score should be a number: got ${typeof result}`);
49+
assert.ok(result > 0, `Score should be positive: got ${result}`);
50+
} else {
51+
assert.strictEqual(result, testCase.expected,
52+
`Score mismatch: got ${result}, expected ${testCase.expected}`);
53+
}
54+
});
55+
});
56+
57+
// Print summary
58+
console.log(`\n=== Test Summary ===`);
59+
console.log(`Passed: ${passed}`);
60+
console.log(`Failed: ${failed}`);
61+
console.log(`Total: ${passed + failed}\n`);
62+
63+
process.exit(failed > 0 ? 1 : 0);

0 commit comments

Comments
 (0)