diff --git a/README.md b/README.md index ed9718d..500b339 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,6 @@ -# Technology Reports API (Node.js) +# Reports API -This is a unified Google Cloud Run function that provides technology metrics and information via various endpoints. +This is an HTTP Archive Reporting API that provides reporting data via various endpoints. ## Setup @@ -12,7 +12,6 @@ This is a unified Google Cloud Run function that provides technology metrics and - Set environment variables: ```bash - export PROJECT=httparchive export DATABASE=tech-report-api-prod ``` @@ -20,47 +19,98 @@ This is a unified Google Cloud Run function that provides technology metrics and ```bash npm install - npm start:functions + npm run start ``` -The API will be available at +The API will be available at -### Google Cloud Functions Mode - - ```bash - npm install - npm run start:functions - ``` +## API Endpoints -The function will run on `http://localhost:8080` +- **CORS Enabled**: Cross-origin requests are supported +- **Cache Headers**: 6-hour cache control for static data +- **Health Check**: GET `/` returns health status +- **RESTful API**: All endpoints follow REST conventions +- **Backend caching**: Some responses are cached on the backend for 1 hours to improve latency -## Deployment +### `GET /` -### Deploy to Google Cloud Run Function +Health check endpoint that returns the current status of the API. ```bash -# Deploy to Google Cloud Functions -gcloud functions deploy tech-report-api \ - --runtime nodejs22 \ - --trigger-http \ - --allow-unauthenticated \ - --entry-point api \ - --source . +curl --request GET \ + --url 'https://{{HOST}}/v1/health' ``` -## API Endpoints +Returns a JSON object with the following schema: -## Features +```json +{ + "status": "ok" +} +``` -- **ETag Support**: All endpoints include ETag headers for efficient caching -- **CORS Enabled**: Cross-origin requests are supported -- **Cache Headers**: 6-hour cache control for static data -- **Health Check**: GET `/` returns health status -- **RESTful API**: All endpoints follow REST conventions +### `GET /categories` -### `GET /` +Lists available categories. + +#### Categories Parameters + +- `category` (optional): Filter by category name(s) - comma-separated list +- `onlyname` (optional): If present, returns only category names +- `fields` (optional): Comma-separated list of fields to include in the response (see [Field Selection API Documentation](#field-selection-api-documentation) for details) + +#### Categories Response + +```bash +curl --request GET \ + --url 'https://d{{HOST}}/v1/categories?category=Domain%20parking%2CCI' +``` + +```json +[ + { + "description": "Systems that automate building, testing, and deploying code", + "technologies": [ + "Jenkins", + "TeamCity" + ], + "origins": { + "mobile": 22, + "desktop": 35 + }, + "category": "CI" + }, + { + "description": "Solutions that redirect domains to a different location or page", + "technologies": [ + "Cloudflare", + "Arsys Domain Parking" + ], + "origins": { + "mobile": 14, + "desktop": 8 + }, + "category": "Domain parking" + } +] +``` -Health check +```bash +curl --request GET \ + --url 'https://{{HOST}}/v1/categories?onlyname' +``` + +```json +[ + "A/B Testing", + "Accessibility", + "Accounting", + "Advertising", + "Affiliate programs", + "Analytics", + ... +] +``` ### `GET /technologies` @@ -71,6 +121,7 @@ Lists available technologies with optional filtering. - `technology` (optional): Filter by technology name(s) - comma-separated list - `category` (optional): Filter by category - comma-separated list - `onlyname` (optional): If present, returns only technology names +- `fields` (optional): Comma-separated list of fields to include in the response (see [Field Selection API Documentation](#field-selection-api-documentation) for details) #### Example Request & Response @@ -117,68 +168,40 @@ Returns a JSON object with the following schema: } ``` -### `GET /categories` +### `GET /versions` -Lists available categories. +Lists available versions. -#### Categories Parameters +#### Versions Parameters -- `category` (optional): Filter by category name(s) - comma-separated list -- `onlyname` (optional): If present, returns only category names +- `version` (optional): Filter by version name(s) - comma-separated list +- `technology` (optional): Filter by technology name(s) - comma-separated list +- `category` (optional): Filter by category - comma-separated list +- `onlyname` (optional): If present, returns only version names +- `fields` (optional): Comma-separated list of fields to include in the response (see [Field Selection API Documentation](#field-selection-api-documentation) for details) -#### Categories Response +#### Versions Response ```bash curl --request GET \ - --url 'https://d{{HOST}}/v1/categories?category=Domain%20parking%2CCI' + --url 'https://{{HOST}}/v1/versions?technology=WordPress&version=6.2.2' ``` +Returns a JSON object with the following schema: + ```json [ { - "description": "Systems that automate building, testing, and deploying code", - "technologies": [ - "Jenkins", - "TeamCity" - ], - "origins": { - "mobile": 22, - "desktop": 35 - }, - "category": "CI" - }, - { - "description": "Solutions that redirect domains to a different location or page", - "technologies": [ - "Cloudflare", - "Arsys Domain Parking" - ], + "technology": "WordPress", + "version": "6.2.2", "origins": { - "mobile": 14, - "desktop": 8 - }, - "category": "Domain parking" + "mobile": 123456, + "desktop": 654321 + } } ] ``` -```bash -curl --request GET \ - --url 'https://{{HOST}}/v1/categories?onlyname' -``` - -```json -[ - "A/B Testing", - "Accessibility", - "Accounting", - "Advertising", - "Affiliate programs", - "Analytics", - ... -] -``` - ### `GET /adoption` Provides technology adoption data. @@ -204,9 +227,7 @@ Returns a JSON object with the following schema: [ { "technology": "GoCache", - "geo": "Mexico", "date": "2023-06-01", - "rank": "ALL", "adoption": { "mobile": 19, "desktop": 11 @@ -239,9 +260,7 @@ curl --request GET \ ```json [ { - "geo": "Uruguay", "date": "2023-06-01", - "rank": "ALL", "technology": "DomainFactory", "vitals": [ { @@ -285,9 +304,7 @@ Returns a JSON object with the following schema: ```json [ { - "geo": "Maldives", "date": "2023-06-01", - "rank": "ALL", "technology": "Oracle HTTP Server", "lighthouse": [ { @@ -334,15 +351,38 @@ Returns a JSON object with the following schema: ```json [ { - "client": "desktop", - "date": "2023-07-01", - "geo": "ALL", - "median_bytes_image": "1048110", - "technology": "WordPress", - "median_bytes_total": "2600099", - "median_bytes_js": "652651", - "rank": "ALL" - } + "date": "2020-06-01", + "pageWeight": [ + { + "desktop": { + "median_bytes": 2428028 + }, + "mobile": { + "median_bytes": 2430912 + }, + "name": "total" + }, + { + "desktop": { + "median_bytes": 490451 + }, + "mobile": { + "median_bytes": 477218 + }, + "name": "js" + }, + { + "desktop": { + "median_bytes": 1221876 + }, + "mobile": { + "median_bytes": 1296673 + }, + "name": "images" + } + ], + "technology": "WordPress" + }, ... ] ``` @@ -351,10 +391,71 @@ Returns a JSON object with the following schema: Lists all available ranks. +#### Ranks Response + +```bash +curl --request GET \ + --url 'https://{{HOST}}/v1/ranks' +``` + +Returns a JSON object with the following schema: + +```json +[ + { + "rank": "ALL" + }, + { + "rank": "Top 10M" + }, + ... +] +``` + ### `GET /geos` Lists all available geographic locations. +#### Geos Response + +```bash +curl --request GET \ + --url 'https://{{HOST}}/v1/geos' +``` + +Returns a JSON object with the following schema: + +```json +[ + { + "geo": "ALL" + }, + { + "geo": "United States of America" + }, + ... +] +``` + +### `GET /cache-stats` + +Provides statistics about the API's cache. + +```bash +curl --request GET \ + --url 'https://{{HOST}}/v1/cache-stats' +``` + +Returns a JSON object with the following schema: + +```json +{ + "cache_hits": 12345, + "cache_misses": 6789, + "last_cleared": "2023-10-01T12:00:00Z" +} +``` + ## Testing ```bash @@ -392,29 +493,36 @@ Or in case of an error: The categories and technologies endpoints now support custom field selection, allowing clients to specify exactly which fields they want in the response. This feature helps reduce payload size and improves API performance by returning only the needed data. -### Endpoints Supporting Field Selection +### Categories Endpoint -- `GET /v1/technologies` -- `GET /v1/categories` +- `category` - Category name +- `description` - Category description +- `technologies` - Array of technology names in the category +- `origins` - Array of origin companies/organizations -### Usage +Get only category names: -#### Basic Syntax +```http +GET /v1/categories?fields=category +``` -Add a `fields` parameter to your request with comma-separated field names: +Get categories with descriptions: -``` -GET /v1/technologies?fields=technology,category +```http GET /v1/categories?fields=category,description ``` -#### Examples +### Technologies Endpoint -##### Technologies Endpoint +- `technology` - Technology name +- `category` - Category name +- `description` - Technology description +- `icon` - Icon filename +- `origins` - Array of origin companies/organizations -**Get only technology names and categories:** +Get only technology names and categories: -``` +```http GET /v1/technologies?fields=technology,category ``` @@ -435,77 +543,68 @@ Response: } ``` -**Get technology names and descriptions:** - -``` -GET /v1/technologies?fields=technology,description -``` - -**Combine with existing filters:** +Combine with existing filters: -``` +```http GET /v1/technologies?category=JavaScript%20Frameworks&fields=technology,icon ``` -##### Categories Endpoint - -**Get only category names:** - -``` -GET /v1/categories?fields=category -``` - -**Get categories with descriptions:** - -``` -GET /v1/categories?fields=category,description -``` - -#### Behavior Notes - -1. **Field Priority**: The `fields` parameter takes precedence over other response formatting options, except for `onlyname` -2. **Invalid Fields**: Non-existent fields are silently ignored -3. **Empty Fields**: If no valid fields are specified, the full object is returned -4. **Backward Compatibility**: When `fields` is not specified, endpoints return their default response format -5. **onlyname Override**: The `onlyname` parameter still takes precedence over `fields` for backward compatibility - -#### Available Fields - -##### Technologies Endpoint +### Versions Endpoint - `technology` - Technology name -- `category` - Category name -- `description` - Technology description -- `icon` - Icon filename -- `origins` - Array of origin companies/organizations - -##### Categories Endpoint - -- `category` - Category name -- Additional fields depend on your data structure +- `version` - Version name +- `origins` - Mobile and desktop origins -#### Error Handling +Get only technology and version names: -The field selection feature handles errors gracefully: +```http +GET /v1/versions?fields=technology,version +``` -- Invalid field names are ignored -- Empty field lists return full objects -- Malformed field parameters fallback to default behavior +Response: -#### Performance Benefits +```json +{ + { + "technology": "React", + "version": "18.2.0" + }, + { + "technology": "Angular", + "version": "12.0.0" + }, + ... +} +``` -- **Reduced Payload Size**: Only requested fields are included -- **Faster Parsing**: Clients process smaller JSON objects -- **Bandwidth Savings**: Less data transferred over the network -- **Improved Caching**: More specific responses can be cached more effectively +## Cache Stats Private Endpoint -#### Migration Guide +The Cache Stats private endpoint provides information about the API's cache performance, including cache hits, misses, and the last time the cache was cleared. This endpoint is useful for monitoring and debugging cache behavior. -Existing API consumers are not affected by this change. The field selection feature is entirely opt-in through the `fields` parameter. +```bash +curl "https://tech-report-api-dev-226352634162.us-central1.run.app/v1/cache-stats" \ + -H "Authorization: bearer $(gcloud auth print-identity-token)" +``` -To adopt field selection: +Returns a JSON object with the following schema: -1. Identify which fields your application actually uses -2. Add the `fields` parameter with those field names -3. Update your client code to handle the new response structure -4. Test thoroughly with your specific use cases +```json +{ + "queryCache": { + "total": 3220, + "valid": 2437, + "expired": 783, + "ttl": 3600000 + }, + "dateCache": { + "total": 4, + "valid": 4, + "expired": 0, + "ttl": 3600000 + }, + "config": { + "maxCacheSize": 5000, + "cleanupStrategy": "size-based-lru" + } +} +``` diff --git a/src/controllers/versionsController.js b/src/controllers/versionsController.js index 5c33bc9..426366d 100644 --- a/src/controllers/versionsController.js +++ b/src/controllers/versionsController.js @@ -22,41 +22,33 @@ const listVersions = async (req, res) => { let query = firestore.collection('versions'); - // Apply technology filter - optimize for multiple technologies + // Apply technology filter if (params.technology) { const technologies = convertToArray(params.technology); if (technologies.length <= 30) { // Use single query with 'in' operator for up to 30 technologies (Firestore limit) query = query.where('technology', 'in', technologies); } else { - // For more than 30 technologies, split into multiple queries and run in parallel - const chunks = []; - for (let i = 0; i < technologies.length; i += 30) { - chunks.push(technologies.slice(i, i + 30)); - } - - const promises = chunks.map(chunk => - firestore.collection('versions').where('technology', 'in', chunk).get() - ); - - const snapshots = await Promise.all(promises); - const data = []; - - snapshots.forEach(snapshot => { - snapshot.forEach(doc => { - data.push(doc.data()); - }); - }); - - // Cache the result - setCachedQueryResult(cacheKey, data); - - res.statusCode = 200; - res.end(JSON.stringify(data)); + res.statusCode = 400; + res.end(JSON.stringify({ + success: false, + errors: [{ technology: 'Too many technologies specified. Maximum 30 allowed.' }] + })); return; } } + // Apply version filter + if (params.version) { + query = query.where('version', '==', params.version); + } + + // Only select requested fields if specified + if (params.fields) { + const requestedFields = params.fields.split(',').map(f => f.trim()); + query = query.select(...requestedFields); + } + // Execute single query const snapshot = await query.get(); const data = []; diff --git a/src/package.json b/src/package.json index ca8a44a..55023c3 100644 --- a/src/package.json +++ b/src/package.json @@ -8,7 +8,7 @@ "node": ">=22.0.0" }, "scripts": { - "start": "export DATABASE=tech-report-api-prod &&node index.js", + "start": "export DATABASE=tech-report-api-prod && node index.js", "test": "node --experimental-vm-modules node_modules/jest/bin/jest.js", "test:live": "bash ../test-api.sh" }, diff --git a/src/utils/controllerHelpers.js b/src/utils/controllerHelpers.js index 31e1c00..523c3c8 100644 --- a/src/utils/controllerHelpers.js +++ b/src/utils/controllerHelpers.js @@ -316,7 +316,7 @@ const getCacheStats = () => { ttl: CACHE_TTL }, config: { - maxCacheSize: MAX_CACHE_SIZE, + maxQueryCacheSize: MAX_CACHE_SIZE, cleanupStrategy: 'size-based-lru' } }; diff --git a/terraform/dev/main.tf b/terraform/dev/main.tf index f73acbb..037840f 100644 --- a/terraform/dev/main.tf +++ b/terraform/dev/main.tf @@ -13,8 +13,8 @@ provider "google" { resource "google_api_gateway_api" "api" { provider = google-beta - api_id = "reports-api" - display_name = "Reports API Gateway" + api_id = "reports-api-dev" + display_name = "Reports API Gateway DEV" project = var.project } diff --git a/terraform/dev/variables.tf b/terraform/dev/variables.tf index 4dd84ad..77e6d54 100644 --- a/terraform/dev/variables.tf +++ b/terraform/dev/variables.tf @@ -30,5 +30,5 @@ variable "google_service_account_api_gateway" { variable "min_instances" { description = "(Optional) The limit on the minimum number of function instances that may coexist at a given time." type = number - default = 1 // TODO: Update this to 0 + default = 0 } diff --git a/terraform/modules/run-service/variables.tf b/terraform/modules/run-service/variables.tf index 5ef5596..9f77905 100644 --- a/terraform/modules/run-service/variables.tf +++ b/terraform/modules/run-service/variables.tf @@ -71,7 +71,7 @@ variable "min_instances" { variable "max_instance_request_concurrency" { description = "(Optional) The limit on the maximum number of requests that an instance can handle simultaneously. This can be used to control costs when scaling. Defaults to 1." type = number - default = 18 + default = 80 } variable "environment_variables" { description = "environment_variables" diff --git a/terraform/prod/main.tf b/terraform/prod/main.tf index 5a08aba..50c36a0 100644 --- a/terraform/prod/main.tf +++ b/terraform/prod/main.tf @@ -14,7 +14,7 @@ provider "google" { resource "google_api_gateway_api" "api" { provider = google-beta api_id = "reports-api-prod" - display_name = "Reports API Gateway" + display_name = "Reports API Gateway PROD" project = var.project }