No description
Find a file
Serge Wroclawski cb3083e4cb Merge branch 'fix-ac-no-receivers' into 'main'
Fix gitlab ci and introduce an access log

See merge request babka_net/activitycolander!45
2023-06-02 16:33:08 +00:00
activitycolander Use a smaller, more readable log format 2023-05-27 10:00:41 +02:00
default_config Alter the gitlab ci script to use built-in credentials 2023-05-25 18:06:07 +02:00
examples Merge branch 'feat-ExampleConfig' into 'main' 2023-03-16 16:41:32 +00:00
spec Add SQRL Check using a remote SQRL server and converting the result into a score and note 2023-03-16 17:19:25 +01:00
.gitignore Add docker-compose.yml to gitignore 2023-01-11 13:15:16 +01:00
.gitlab-ci.yml Alter the gitlab ci script to use built-in credentials 2023-05-25 18:06:07 +02:00
CHANGELOG.md Include new API in CHANGELOG and restructure changelog. 2023-04-21 16:13:19 +02:00
Dockerfile Rename config folder to default_config folder 2023-03-15 19:56:51 +01:00
entrypoint.sh Add documentation to use AC with docker and fix syntax error in entrypoint.sh 2023-01-05 19:56:21 +01:00
LICENSE Add LICENSE 2022-09-04 02:16:59 +00:00
nginx.conf.docker Use a smaller, more readable log format 2023-05-27 10:00:41 +02:00
README.md Update README 2023-01-17 08:52:13 -08:00
ROADMAP.md Update version to 0.8 2023-03-16 16:58:37 +00:00
version.txt Update version to 0.8 2023-03-16 16:58:37 +00:00
words.txt Rename SimpleKeywordCheck to WholeWordCheck 2022-10-23 19:09:00 +02:00

ActivityColander: A Fediverse Anti-Spam Gateway

About

ActivityColander is a Fediverse spam gateway, designed to keep unwanted messages from either reaching your ActivityPub server, or tagging them for handling later

The goal is to make it extremely easy and fast to a Fediverse instance administrator to protect their users from unsolicited commercial messages, abuse/harassment, hateful content or other material that violates your instance's policies.

ActivityColander (AC) is designed to be placed between your HTTP Reverse Proxy (NGinx, Traefic, Caddy, etc.) and your ActivityPub server (Mastodon, Pleroma, Pixelfed, etc.)

flowchart LR
    Internet --> RP(Reverse Proxy) --> AC(ActivityColander) --> AP(ActivityPub Server)

While it would be possible to drive all external traffic through ActivityColander, the right approach is to only drive ActivityPub traffic from other servers through.

flowchart LR
    Internet --> RP(Reverse Proxy)
    RP --> AC(ActivityColander)
    RP --> AP(ActivityPub Server)
    AC --> AP

Installation

Overview

In most cases, your ActivityPub server will be sitting behind an existing reverse proxy server. This reverse proxy will be proxying your SSL/TLS connection, as well as possibly sending requests to different backend services.

In some cases, your ActivityPub server may already come bundled with a web server that bundles all of these services together. For example, GoToSocial comes bundled with a web server that can automatically retrieve an SSL certificate from the Let's Encrypt project.

For installations where the SSL proxy is separate, installing ActivityColander is fairly straightforward. But if you're using one of these bundled installations, you will need to "unbundle" your services in order to inject ActivityColander.

Even if you're using a server like Mastodon or Pixelfed, please read the following section on GoToSocial and Docker Compose, as this simple example will make understanding your installation easier.

GoToSocial Docker Compose Installation Example

THIS SECTION IS INCOMPLETE/POSSIBLY INCORRECT

We mentioned GoToSocial earlier, so let's make an example installation using GoToSocial with Docker behind a Caddy server

Your installation may be quite a bit differne than our example. This example is just for illustration.

If you're using the lucaslorentz Caddy with GoToSocial with Docker Compose, your existing docker-compose.yml file may look something like:


version: "3.3"

services:
  caddy:
    image: lucaslorentz/caddy-docker-proxy:ci-alpine
    ports:
      - 80:80
      - 443:443
    environment:
      - CADDY_INGRESS_NETWORKS=proxy
    networks:
      - proxy
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - ./caddy_data:/data

  gotosocial:
    image: superseriousbusiness/gotosocial:latest
    networks:
      - proxy
    labels:
      caddy: example.org
      caddy.reverse_proxy: "{{upstreams 8080}}"

networks:
  proxy:
    external:

In order to add ActivityColander to the mix, we need to do four things:

  1. We need the ActivityColander software
  2. We need Redis, which ActivityColander uses for persistent storage
  3. We need to configure Caddy to proxy Server to Server communications through ActivityColander
  4. We need to configure ActivityColander to send messages it recieves back to GoToSocial

No matter what your software stack you're using- eg Mastodon with Traefic- your installation will follow roughly the same four steps.

To install ActivityColander and Redis, we'll add the following two services to our docker-compose.yml. Please note that this installation won't work as shown yet!


  activitycolander:
    build: https://gitlab.com/babka_net/activitycolander.git#deployment
    environment:
      REDIS_HOST=redis

  redis:
  image: redis:5.0-alpine
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
    volumes:
      - ./redis:/data

If we add this configuration as shown, ActivityColander and Redis will be installed, but not configured.

We must tell Caddy to proxy server to server messages through ActivityColander. Luckily for us, in GoToSocial, these messages all follow the pattern */inbox, so let's do that now:


  gotosocial:
    image: superseriousbusiness/gotosocial:latest
    networks:
      - proxy
    labels:
      caddy: example.org
      caddy.reverse_proxy: "{{upstreams 8080}}"
      caddy.reverse_proxy: */inbox activitycolander 8000

As confusing as it is to add this configuration to gotosocial, we're using the labels directive to tell Caddy to direct requests to */inbox to activitycolander, running on port 8000.

That means our last step is to tell ActivityColander to send messages back to GoToSocial. To do that, we just use the AP_SERVER and AP_PORT environment variables:



  activitycolander:
    build: https://gitlab.com/babka_net/activitycolander.git#deployment
    environment:
      REDIS_HOST=redis
      AP_SERVER=gotosocial
      AP_PORT=8080

Mastodon Docker Installation

Most ActivityPub servers today run Mastodon, so let's discuss ActivityColander and Mastodon.

Mastodon consists of two services, puma and streaming.

When you install Mastodon, you need to put a reverse proxy in front of it to direct requests to the right backend application server.

This is an ideal place to put ActivityColander in the mix.

Assuming you're using NGinx, you probably have an nginx.conf that looks something like:

 location @proxy {
...
  }

  location /api/v1/streaming {
...
  }

We can simply add a new stanza to proxy location */inbox to ActivityColander.

  location */inbox {
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto https;
    proxy_set_header Proxy "";
    proxy_pass_header Server;

    proxy_pass http://activitycolander:8000;
    proxy_buffering off;
    proxy_redirect off;
    proxy_http_version 1.1;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection $connection_upgrade;

    tcp_nodelay on;
  }

Alternatively, you can use the technique outlined in the GoToSocial section. That's entirely up to you.

Since Mastodon already has Redis, there's no need to install Redis twice, so we recommend that you simply assign ActivityColander a second Redis database. To do this, instead of REDIS_HOST, use REDIS_URL thusly:

...
    environment:
      REDIS_URL=redis://redis/2"

If you use Redis in a cluster, unfortunately Redis does not support multiple databases. In this case, you have two reasonable options. The first option is to simply point your ActivityColander Redis at the same Redis server as your Mastodon Redis server. ActivityColander is very careful with key name selection, so a conflict is extremely unlikely.

The second, safer option is to run a separate Redis instance for ActivityColander.

From Source

While more difficult than using the Docker image, it is also possible to install and configure ActivityColander from source. The easiest way to do this is to do this is to use the official OpenResty distribution packages and follow their instructions.

Depending on your installation procedure, OpenResty will be installed in different locations, but for simplictity's sake, we will call the OpenResty installation path /usr/local/openresty from here on out, though this may be different on your system.

Once OpenResty itself is installed and configured, you can drop the activitycolander directory somewhere on your system. We recommend placing it under the OpenResty installation nginx directory, ie /usr/local/openresty/nginx/activitycolander, though that is entirely up to you.

Then whatever path you've chosen to install ActicityColander, you need to add that path to your lua_package_path in your nginx.conf. Finally, set access_by_lua_file to the http-access.lua file inside ActicityColander, eg /usr/local/openrety/nginx/activitycolander/http-access.lua It's probaby easiest to simply look at our example nginx.conf file and use that as a guide.

When installing ActicityColander manually, you will also need to be sure to configure any additional libraries or services that we use. For example, if a check uses Redis, then you will need to configure a Redis server and make it available to your AC server as well.

Whether you use Docker or not, looking at our Dockerfile may help guide you through the process.

Redis

In addition to ActivityColander itself, you will also need to install redis for ActivityColander to function.

If you are using Docker Compose, then you only need to install Redis the standard Docker way, and ensure that the service name is redis, eg:


services:
  redis:
    image: redis:6.2-alpine
    volumes: 
      - ./redis_data:/data
      

If you install redis to another hostname (eg localhost if you aren't using Docker Compose) then you will need to modify the redis_host and possibly redis_port in your config.lua.

A Brief Guide to Understanding ActivityColander

This is a very brief overview of what exactly ActivityColander is doing, just sufficient for you to configure it.

ActivityColender takes in an incoming HTTP request, determines if it's an ActitivyPub message, and if it is, tries to determine if the message is good (ham) or bad (spam). If you're familiar with email systems like rspamd or SpamAssassin, it works nearly the same way. If you aren't familiar with either of them, it works very much like the Chocolate Egg room from Willy Wonka.

Check Overview

In the Chocolate Egg room, every chocolate egg is checked if it's a good egg or bad egg. ActivityColander does the same thing, except with ActivityPub messages. But since we don't have a magic scale, we use a series of automated checks on the message.

This graph should help you understand the process:

graph TD
   IsActivityPub(Is an Activity Pub Message?) --> |No| PassThrough(Pass it through)
   IsActivityPub --> |Yes| IsSpammerDomain{Comes from a known spammer domain?}
   subgraph Check Pipeline
   IsSpammerDomain --> RecipientRelationship{Recipient is following sender?}
   RecipientRelationship --> ContainSpamTerms{Contains spam terms?}
   ContainSpamTerms --> FinalTally{Final Tally}
   end
   FinalTally ----> |Certainly Spam| BlockMessages(Reject Message)
   FinalTally ---> |Likely Spam| TagMessage(Tag Message as Spam)
   FinalTally --> | Likely Ham | PassThrough
   TagMessage --> |Pass along the tagged message |PassThrough

The Check Pipeline contains each check that is run, and those checks are configurable, both in terms of which checks are run against a message and in terms of configuration for the checks themselves.

Understanding Scores

During the Check Pipeline, each check will return a final score for that check, along with an an optional plain text note. The score from each check is between -1 (Ham) and 1 (Spam). A score may be an integer or floating point number between those two values. A score of 0 would indicate no knowledge on whether the message is ham or spam.

Because some checks are better indicators of Spammyness than others, the pipeline also contains a weight for each check.

Then during the tally, the weighted scores are added up and averaged out. From there, ActivityColander looks at two configurable values, block_threshold and spam_threshold. If the message score exceeds block_threshold then the ActivityPub messages is rejected and the sender is sent a return code such as 403 (Unauthorized). If the message does exceeds the spam_theshold then it is tagged as spam and sent along to the ActivityPub server where hopefully the ActivityPub server will do something sensible with it, such as put it in quarantine.

Bypassing the Pipleline

It is also possible for a check to be written in such a way as to cause the message to bypass the rest of the pipleline. This is useful in situations where a check is acting as an "Allow List" or "Deny List". We do not configure any checks to do this by default, and do not recommend this behavior, but it is possible!

Configuration

In some lucky cases, you may not need to do any additional configuration, but most of the time, you'll need to do some additional configuration.

We provide a default set of config files in this repository in the config directory.

With Docker/Docker Compose

If you're using Docker or Docker Compose, then the easiest way to deal with configuration is to overlay mount that directory to /config.

In docker, this would look something like:


  activitycolander:
    image: registry.gitlab.com/babka_net/activitycolander:v0.6.3
    volume:
     - ./config:/config

Without Docker

If you're not using Docker, then you'll need to place the config file relative to your ActivityColander installation, eg /usr/local/openresty/nginx/activitycolander/config or you can mofity the lua_package_path in the nginx.conf to set it to wherever you like. Our example uses /config for our Docker installation, but you may want to choose someplace lke /etc/activitycolander if you feel that makes more sense for your installation.

Understanding the configuration variables

The main configuration file is config.lua.

There, you can set or override a number of values. The file is heavily commented, so you should be understand what you're configuring by looking at it, but it's used to set various variables such as the network location of your redis server, the default spam and ham scores, and whether the system is in dry_run mode.

There is also a check_pipeline.lua file which contains the configuration for the Check Pipleine, in other words it specifies which checks are run, and configuration for each check.

Development

Check Development

If you wish to develop on ActivityColander, either for your own instance or generally, you're encouraged to do so. The most likely starting place will be to develop your own checks. The best thing to do there would be to examine existing checks.

ActivityPub Integration

While ActivityColander can block spam messages on its own, it's better use is to mark spam Activities as such and let the ActivityPub server decide how to handle it. It does this through passing two HTTP headers to the ActivityPub server, ActivityPub-Spam-Result and ActivityPub-Spam-Details.

ActivityPub-Spam-Result contains a string in the form: <decision> [ <score> / <max> ] where decision is whether or not the ActivityColander server believes it to be spam or not (spam or passed), score is the weighted results of the tests, and max is the maximum weighted spammyness score of the tests.

ActivityPub-Spam-Details contains a semi-colon (;) deliminated list of tests in the form of: <name> (<score>)[<notes>] where name is the name of the test (or check name by default), score is the score from the test, and notes are any plain text outcome the test provided.