Skip to content

Commit 5763007

Browse files
Builds a merge tool in Golang; validates everything works as expected
Instead of using a shell script for managing the merge cli tool, I decided to build a Golang app around it instead for ease of use, updating, management, and testing. The project is functional and can be used as-is today, including uploading to R2. Ostensibly S3 uploads work too, but I haven't tried.
1 parent 36cd602 commit 5763007

File tree

20 files changed

+1396
-34
lines changed

20 files changed

+1396
-34
lines changed

.dockerignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
.env

.github/workflows/test.yml

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
name: Test
2+
3+
on:
4+
push:
5+
branches: [main]
6+
pull_request:
7+
branches: [main]
8+
9+
env:
10+
GO_VERSION: '1.23'
11+
12+
jobs:
13+
unit-tests:
14+
name: Unit Tests
15+
runs-on: ubuntu-latest
16+
steps:
17+
- uses: actions/checkout@v4
18+
19+
- name: Setup Go
20+
uses: actions/setup-go@v5
21+
with:
22+
go-version: ${{ env.GO_VERSION }}
23+
24+
- name: Cache Go modules
25+
uses: actions/cache@v3
26+
with:
27+
path: ~/go/pkg/mod
28+
key: ${{ runner.os }}-go-${{ hashFiles('**/go.sum') }}
29+
restore-keys: |
30+
${{ runner.os }}-go-
31+
32+
- name: Download dependencies
33+
working-directory: ./merge
34+
run: go mod download
35+
36+
- name: Run unit tests
37+
working-directory: ./merge
38+
run: make test
39+
40+
lint:
41+
name: Lint
42+
runs-on: ubuntu-latest
43+
steps:
44+
- uses: actions/checkout@v4
45+
46+
- name: Setup Go
47+
uses: actions/setup-go@v5
48+
with:
49+
go-version: ${{ env.GO_VERSION }}
50+
51+
- name: golangci-lint
52+
uses: golangci/golangci-lint-action@v3
53+
with:
54+
version: latest
55+
working-directory: ./merge
56+
args: --timeout=5m

.gitignore

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1,3 @@
1-
.claude
1+
.claude
2+
.env
3+
merge/gtfs-merge

CLAUDE.md

Lines changed: 62 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -4,36 +4,85 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
44

55
## Project Overview
66

7-
This is a Docker-based service that wraps the OneBusAway GTFS Merge CLI tool. It merges multiple GTFS (General Transit Feed Specification) feeds into a single feed and can upload the result to AWS S3.
7+
This is a Docker-based service that merges multiple GTFS (General Transit Feed Specification) feeds into a single feed and can upload the result to AWS S3. It uses the OneBusAway GTFS Merge CLI tool internally, wrapped in a Go application for better configuration management and error handling.
88

99
## Key Commands
1010

11+
### Build and run locally (for development)
12+
```bash
13+
cd merge
14+
make test-unit # Run unit tests
15+
make test-integration # Run integration tests (requires JAR)
16+
go run cmd/gtfs-merge/main.go --config ../example-configs/puget-sound.json
17+
```
18+
1119
### Build the Docker image
1220
```bash
13-
docker build --tag oba-merge-service .
21+
docker build --tag gtfs-merge-service .
1422
```
1523

1624
### Run the container
1725
```bash
18-
docker run oba-merge-service
26+
docker run -e AWS_ACCESS_KEY_ID=xxx -e AWS_SECRET_ACCESS_KEY=yyy \
27+
-v $(pwd)/config.json:/config.json \
28+
gtfs-merge-service --config /config.json
1929
```
2030

2131
## Architecture
2232

2333
The service consists of:
2434

25-
1. **Dockerfile**: Multi-architecture Docker image based on Eclipse Temurin Java 17 JRE that:
26-
- Downloads the OneBusAway GTFS Merge CLI JAR from Maven Central (version 9.0.1)
27-
- Installs AWS CLI v2 for S3 uploads (supports both x86_64 and aarch64 architectures)
28-
- Sets up the merge.sh script as the entrypoint
35+
1. **Go Application** (`merge/`):
36+
- `cmd/gtfs-merge/main.go`: Entry point that orchestrates the merge process
37+
- `internal/config/`: Configuration parsing and validation
38+
- `internal/download/`: GTFS feed downloading logic
39+
- `internal/merge/`: OneBusAway JAR execution wrapper
40+
- `internal/validate/`: GTFS feed validation
41+
- `internal/upload/`: S3 upload functionality
42+
43+
2. **Dockerfile**: Multi-stage build that:
44+
- Builds the Go binary in an Alpine container
45+
- Creates a runtime image with Java 17 JRE (for OneBusAway JAR)
46+
- Downloads the OneBusAway GTFS Merge CLI JAR from Maven Central
47+
- Installs AWS CLI v2 for S3 operations (supports both x86_64 and aarch64)
48+
49+
3. **Configuration**: JSON-based configuration that specifies:
50+
- Input GTFS feed URLs
51+
- Agency renaming rules
52+
- Output file location (local or S3)
53+
- Optional validation settings
2954

30-
2. **merge.sh**: Main execution script that handles the GTFS merge process and coordinates between the Java CLI tool and AWS S3 operations
55+
## Configuration Format
3156

32-
3. **install-awscli.sh**: Helper script that detects system architecture and installs the appropriate AWS CLI v2 version
57+
```json
58+
{
59+
"feeds": [
60+
{
61+
"url": "https://example.com/gtfs.zip",
62+
"agencyIdMapping": {"old_id": "new_id"}
63+
}
64+
],
65+
"output": {
66+
"type": "s3",
67+
"bucket": "my-bucket",
68+
"key": "merged.zip"
69+
},
70+
"validate": true
71+
}
72+
```
73+
74+
## Testing
75+
76+
```bash
77+
cd merge
78+
make test-unit # Unit tests only
79+
make test-integration # Integration tests (requires JAR)
80+
```
3381

3482
## Important Notes
3583

36-
- The JAR version is parameterized in the Dockerfile as `JAR_VERSION` (default: 9.0.1)
37-
- The service expects to work with static GTFS feeds and merge instructions
38-
- Output can be uploaded to S3-compatible storage services
39-
- The merge.sh script is the main entry point and should contain the logic for downloading feeds, merging them, and uploading results
84+
- The OneBusAway JAR version is set to 9.0.1 (configurable via `JAR_VERSION` build arg)
85+
- The service validates GTFS feeds before and after merging when configured
86+
- S3 uploads require AWS credentials via environment variables or IAM role
87+
- Output can be uploaded to S3-compatible storage services; requires AWS credentials set via .env
88+
- The Go binary handles all orchestration; the JAR is only used for the actual merge operation

Dockerfile

Lines changed: 22 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,19 @@
1+
# Stage 1: Build the Go binary
2+
FROM golang:1.23-alpine AS builder
3+
4+
WORKDIR /build
5+
6+
# Copy go mod files
7+
COPY merge/go.mod merge/go.sum ./
8+
RUN go mod download
9+
10+
# Copy source code
11+
COPY merge/ ./
12+
13+
# Build the binary
14+
RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o gtfs-merge cmd/gtfs-merge/main.go
15+
16+
# Stage 2: Runtime image
117
FROM eclipse-temurin:17-jre
218

319
ARG JAR_VERSION=9.0.1
@@ -20,13 +36,15 @@ RUN /tmp/install-awscli.sh && \
2036
# Set working directory
2137
WORKDIR /app
2238

23-
COPY merge.sh ./merge.sh
24-
RUN chmod +x ./merge.sh
39+
# Copy Go binary from builder
40+
COPY --from=builder /build/gtfs-merge /app/gtfs-merge
41+
RUN chmod +x /app/gtfs-merge
2542

43+
# Download the OneBusAway merge CLI JAR
2644
RUN curl \
2745
-L https://repo1.maven.org/maven2/org/onebusaway/onebusaway-gtfs-merge-cli/${JAR_VERSION}/onebusaway-gtfs-merge-cli-${JAR_VERSION}.jar \
2846
-o merge-cli.jar
2947

30-
# Use ENTRYPOINT for the main script so it always runs; CMD can be used to pass arguments or be overridden
31-
ENTRYPOINT ["/app/merge.sh"]
48+
# Use ENTRYPOINT for the Go binary
49+
ENTRYPOINT ["/app/gtfs-merge"]
3250
CMD []

README.markdown

Lines changed: 48 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,10 +14,57 @@ This tool can be used to automate the creation of a merged static GTFS bundle wh
1414
docker build --tag oba-merge-service .
1515
```
1616

17+
## Set environment variables
18+
19+
The service requires several environment variables to be set:
20+
21+
- `AWS_ACCESS_KEY_ID`: AWS/R2 access key ID
22+
- `AWS_SECRET_ACCESS_KEY`: AWS/R2 secret access key
23+
- `AWS_ENDPOINT_URL`: S3-compatible endpoint URL (or Cloudflare R2)
24+
- `S3_BUCKET`: Destination bucket for the merged GTFS feed
25+
- `ALLOWED_DOMAINS`: Comma-separated list of allowed domains for config and feed URLs (security feature)
26+
27+
Copy env.example to .env and fill in the file.
28+
29+
```
30+
AWS_ACCESS_KEY_ID=your-access-key
31+
AWS_SECRET_ACCESS_KEY=your-secret-key
32+
AWS_ENDPOINT_URL=https://your-account.r2.cloudflarestorage.com
33+
S3_BUCKET=your-bucket-name
34+
ALLOWED_DOMAINS=example.com,transit.agency.gov
35+
```
36+
1737
## Run the container
1838

1939
```bash
20-
docker run oba-merge-service
40+
docker run \
41+
--env-file .env
42+
-v ./example-configs:/config \
43+
oba-merge-service -config-path /config/puget-sound.json
44+
```
45+
46+
### Configuration File Format
47+
48+
The service expects a JSON configuration file with the following structure:
49+
50+
```json
51+
{
52+
"feeds": [
53+
"https://example.com/gtfs/feed1.zip",
54+
"https://example.com/gtfs/feed2.zip"
55+
],
56+
"mergeStrategies": {
57+
"agency.txt": "identity",
58+
"stops.txt": "fuzzy",
59+
"routes.txt": "fuzzy",
60+
"trips.txt": "identity",
61+
"stop_times.txt": "identity",
62+
"calendar.txt": "identity",
63+
"shapes.txt": "fuzzy",
64+
"transfers.txt": "none"
65+
},
66+
"outputName": "merged-gtfs.zip"
67+
}
2168
```
2269

2370
# Apache 2.0 License

env.example

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
AWS_ACCESS_KEY_ID=
2+
AWS_SECRET_ACCESS_KEY=
3+
AWS_ENDPOINT_URL=
4+
S3_BUCKET=
5+
ALLOWED_DOMAINS=localhost,metro.kingcounty.gov,business.wsdot.wa.gov

example-configs/puget-sound.json

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
{
2+
"feeds": [
3+
"https://metro.kingcounty.gov/gtfs/google_transit.zip",
4+
"https://business.wsdot.wa.gov/Transit/csv_files/wsf/google_transit.zip"
5+
],
6+
"mergeStrategies": {
7+
"agency.txt": "identity",
8+
"stops.txt": "fuzzy",
9+
"routes.txt": "fuzzy",
10+
"trips.txt": "identity",
11+
"stop_times.txt": "identity",
12+
"calendar.txt": "identity",
13+
"shapes.txt": "fuzzy",
14+
"transfers.txt": "none"
15+
},
16+
"outputName": "puget-sound-merged-gtfs.zip"
17+
}

merge.sh

Lines changed: 0 additions & 15 deletions
This file was deleted.

merge/Makefile

Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
.PHONY: all build test clean docker-build
2+
3+
# Variables
4+
BINARY_NAME=gtfs-merge
5+
DOCKER_IMAGE=oba-merge-service
6+
GO=go
7+
GOTEST=$(GO) test
8+
GOBUILD=$(GO) build
9+
GOCLEAN=$(GO) clean
10+
11+
# Default target
12+
all: build
13+
14+
# Build the binary
15+
build:
16+
$(GOBUILD) -o $(BINARY_NAME) -v ./cmd/gtfs-merge
17+
18+
# Clean build artifacts
19+
clean:
20+
$(GOCLEAN)
21+
rm -f $(BINARY_NAME)
22+
23+
# Run unit tests
24+
test:
25+
$(GOTEST) ./...
26+
27+
# Run tests with coverage
28+
test-coverage:
29+
$(GOTEST) -v -coverprofile=coverage.out ./...
30+
$(GO) tool cover -html=coverage.out -o coverage.html
31+
@echo "Coverage report generated: coverage.html"
32+
33+
# Lint the code
34+
lint:
35+
@which golangci-lint > /dev/null || (echo "golangci-lint not found, installing..." && go install github.com/golangci/golangci-lint/cmd/golangci-lint@latest)
36+
golangci-lint run ./...
37+
38+
# Format code
39+
fmt:
40+
$(GO) fmt ./...
41+
42+
# Vet code
43+
vet:
44+
$(GO) vet ./...
45+
46+
# Docker targets
47+
docker-build:
48+
docker build --tag $(DOCKER_IMAGE) -f ../Dockerfile ..
49+
50+
# Install dependencies
51+
deps:
52+
$(GO) mod download
53+
$(GO) mod tidy
54+
55+
# Check for outdated dependencies
56+
deps-check:
57+
$(GO) list -u -m all
58+
59+
# Help target
60+
help:
61+
@echo "Available targets:"
62+
@echo " make build - Build the binary"
63+
@echo " make test - Run unit tests"
64+
@echo " make lint - Lint the code"
65+
@echo " make fmt - Format the code"
66+
@echo " make vet - Vet the code"
67+
@echo " make docker-build - Build Docker image"
68+
@echo " make clean - Clean build artifacts"
69+
@echo " make deps - Install dependencies"
70+
@echo " make help - Show this help message"

0 commit comments

Comments
 (0)