Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

brunns/rss-agg

Open more actions menu

Repository files navigation

RSS aggregator

made-with-python made-with-uv Licence ci deploy GitHub forks GitHub stars GitHub watchers GitHub contributors GitHub issues GitHub issues-closed GitHub pull-requests GitHub pull-requests closed Lines of Code Top Language Languages Code to Comment xc compatible zread

Aggregate, de-duplicate and republish RSS feeds.

The problem I'm solving here is this; if I subscribe to the main Guardian RSS feed, I see a great many articles I'm not interested in1. But if instead I subscribe to the feeds for individual tags, while I don't see the things I'm not interested in, I do see a great many duplicates - articles with multiple tags show up in multiple feeds. This little app allows me to have the best of both worlds - I can see only2 the articles I'm interested in my reader3, and only once.

Prerequisites

Local development requires:

Some tasks may require additional tools:

On a Mac, you can install most of these with homebrew and asdf.

brew install uv xc gh 1password-cli awscli asdf colima libxml2 fzf gum # And follow any additional setup instructions brew gives you
asdf plugin add terraform
asdf plugin add nodejs
asdf install

Design

The application is a Flask async web app deployed as an AWS Lambda function running within the Lambda Web Adapter.

Good places to start investigating the code are:

  • web.py, the application entry point.
  • routes.py, the Flask route definitions.

On each request, RSSService orchestrates the full pipeline:

Services are wired together with wireup for dependency injection, with configuration sourced from environment variables as per the twelve-factor app. AWS API Gateway provides the public HTTP endpoint, backed by Terraform-managed infrastructure-as-code located in terraform/.

Configuration

Configuration is sourced from environment variables:

Variable Description Default
FEEDS_SERVICE Feeds service implementation: FileFeedsService or S3FeedsService FileFeedsService
FEEDS_FILE Path to the feeds list file (used by FileFeedsService) feeds.txt
MAX_ITEMS Maximum number of items in the output feed 50
MAX_CONNECTIONS Maximum number of concurrent HTTP connections 16
MAX_KEEPALIVE_CONNECTIONS Maximum number of keep-alive HTTP connections 16
KEEPALIVE_EXPIRY Keep-alive connection expiry in seconds 5
RETRIES Number of HTTP retry attempts per feed 3
TIMEOUT HTTP request timeout in seconds 3
LOG_LEVEL Log verbosity: ERROR, WARNING, INFO, or DEBUG INFO

The following are additionally required when FEEDS_SERVICE=S3FeedsService:

Variable Description Default
FEEDS_BUCKET_NAME S3 bucket name containing the feeds list brunns-rss-agg-feeds
FEEDS_OBJECT_NAME S3 object key for the feeds list feeds.txt
AWS_DEFAULT_REGION AWS region for S3 access boto3 default
AWS_ACCESS_KEY_ID AWS access key ID for S3 access boto3 default
AWS_SECRET_ACCESS_KEY AWS secret access key for S3 access boto3 default
S3_ENDPOINT Custom S3-compatible endpoint URL, e.g. for local testing AWS S3

Tasks

These tasks can be run using xc.

pc

Precommit tasks

Requires: test, lint, audit

RunDeps: async

#!/usr/bin/env python
import this

cli

Run CLI - outputs RSS to stdout

uv run cli -vv

web

Run web server

./run.sh

test

Run all tests

Requires: unit, integration

RunDeps: async

unit

Unit tests

uv run pytest tests/unit/ --durations=10 --cov-report term-missing --cov-fail-under 100 --cov src

integration

Integration tests

if command -v colima > /dev/null; then colima status || colima start; fi
uv run pytest tests/integration/ -s --durations=10

format

Format code

uv run ruff format .
uv run ruff check . --fix-only

lint

Code quality & security checks

Requires: lint-code, type-checking

RunDeps: async

lint-code

Lint code

uv run ruff format . --check
uv run ruff check .

type-checking

Type checking

uv run pyright

audit

Audit for known vulnerabilities

Requires: audit-py, audit-gha

RunDeps: async

audit-py

Audit Python dependencies for known vulnerabilities

uv audit

audit-gha

Scan GitHub Actions for vulnerabilities

uvx zizmor -o .

build

Build lambda image

Inputs: IMAGE_NAME

Environment: IMAGE_NAME=deployment_package.zip

rm -rf build/ terraform/"$IMAGE_NAME"
uv export --no-dev --python 3.14 --format requirements-txt --output-file requirements.txt
uv pip install -r requirements.txt --target build --python 3.14
cp -r src/rss_agg build/
cp run.sh build/
cp feeds.txt build/
chmod +x build/run.sh
cd build
zip -r ../terraform/"$IMAGE_NAME" .
cd ..

terraform-init

Initialise terraform

Directory: ./terraform

terraform init

plan

Plan infrastructure changes

Requires: build, terraform-init

RunDeps: async

Directory: ./terraform

terraform plan

push

Push to origin, and monitor CI workflow

Requires: pc

Inputs: WORKFLOW

Environment: WORKFLOW=ci.yml

git push
sleep 5
RUN_ID=$(gh run list --workflow="$WORKFLOW" --limit=1 --json databaseId --jq '.[0].databaseId')
gh run watch "$RUN_ID" --exit-status

deploy

Run deployment workflow

Inputs: WORKFLOW

Environment: WORKFLOW=cd.yml

gh workflow run "$WORKFLOW"
sleep 5
RUN_ID=$(gh run list --workflow="$WORKFLOW" --limit=1 --json databaseId --jq '.[0].databaseId')
gh run watch "$RUN_ID" --exit-status

healthcheck

Check feed is running and returning XML

Inputs: API_URL

Environment: API_URL=http://0.0.0.0:8080

set +x
echo "Testing API at: $API_URL"

# curl will retry up to 5 times with 5 second delays, fail on non-200 status
if curl -fsSL --retry 5 --retry-delay 5 "$API_URL" | xmllint --noout - 2>/dev/null; then
    echo "✓ API returned valid XML"
else
    echo "✗ API check failed (ensure xmllint is installed: brew install libxml2)"
    exit 1
fi

logs

Query CloudWatch logs for recent Lambda activity

Inputs: DURATION

Environment: DURATION=1h

aws logs tail /aws/lambda/rss_aggregator --since "$DURATION" --format short

create-s3-bucket

One-off commands to set up the AWS S3 bucket that terraform will use to store infrastructure state. Run aws configure first to authenticate if necessary.

aws s3 mb s3://brunns-rss-agg-terraform-state --region eu-west-2
aws s3api put-bucket-versioning --bucket brunns-rss-agg-terraform-state --versioning-configuration Status=Enabled

upload-feeds

Upload feeds.txt (by default) to the S3 feeds bucket

Inputs: FEEDS_FILE, BUCKET, OBJECT

Environment: FEEDS_FILE=feeds.txt

Environment: BUCKET=brunns-rss-agg-feeds

Environment: OBJECT=feeds.txt

aws s3 cp "$FEEDS_FILE" s3://"$BUCKET"/"$OBJECT"

Initial setup steps

Use brunns-python-template or similar.

Footnotes

  1. How is there so much sport in the world, and so many people writing and talking about it?

  2. Or mostly only - the sub-editors do seem to do some questionable tagging sometimes.

  3. Currently Feedly.

  4. On a Mac - I'm not sure what you might use on other platforms.

About

Aggregate, de-duplicate and republish RSS feeds

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Morty Proxy This is a proxified and sanitized view of the page, visit original site.