HahaBimbi

HahaBimbi uploads RSS feeds or any textual content into Mastodon social network. It is also able to handle references to remote media attachments.

About

There are several RSS-to-Mastodon Python scripts. Unfortunately, I was not satisfied with their stability, internals and overall capabilities of message formatting. That is why I decided to extract some snippets from my previous project, and the final product is now repackaged as HahaBimbi.

Notice: Haha Bimbi is completely fictional character. She appears in several Czech books for children. This software is not related to her.

Usage

Download executable JAR package which already contains all dependencies, rename the file to egor.jar so that the command line operations could be shorter.

HahaBimbi works in several independent steps. If you do not know Hahabimbi parameters yet, issue:

$ java -jar egor.jar
egor.mastodon.Hahabimbi supported commands

help           This help page
concentrator   Fetch RSS
register       Register this application to Mastodon instance
toot           Upload your posts
vacuum         Garbage collector
media          Media uploader

The help page describes four operations/steps:

register: registers an import application into an account in Mastodon
concentrator: fetches RSS sources, modifies their format and prepares mid-product file which can be imported into Mastodon
toot: is the import application that imports your statuses into a registered account in Mastodon
vacuum: is an utility that helps you to manage a support database of known statuses
media: is remote media uploader

In this project, the concentrator operation is a simple aggregator. My other projects use an advance version of this module to analyze and act according to aggregated data/information, so I left the original module name.

Register operation should be executed only once per one account, Concentrator and Toot operations should be launched from your cron periodically, e.g. every hour. Vacuum operation should be started time to time, e.g. every week to optimize data structures (see below).

The operations show their help page, if you run them with -help parameter, e.g.

$ java -jar egor.jar concentrator -help
usage: egor.core.Concentrator [-a <ID>] [-f <FORMAT_STRING>] [-fc <FILENAME>] [-fm <LENGTH>] [-help] [-i <FILENAME>] [-r <FILENAME>] [-ri] [-u <URL>]
 -a,--user-agent <ID>                          user-agent identification
 -f,--format <FORMAT_STRING>                   TITLE, DESC, LINK, TAGS variables and any text
 -fc,--formatted-category-aliases <FILENAME>   category aliases table
 -fm,--formatted-max-len <LENGTH>              maximum length of one item output after formatting,
                                               default: 450
 -ft,--formatted-max-tags <COUNT>              maximum number of tags extracted from RSS categories,
                                               default: 12
 -help                                         print this message
 -i,--rss-index <FILENAME>                     file with RSS URLs
 -r,--refs-db <FILENAME>                       output file with references to extracted media attachments
 -ri,--extract-img-src                         extract media attachments from img src of item description
 -u,--rss-url <URL>                            input RSS URL

Let us go through the simplest import process (one account and one RSS), step by step. In this guide it is assumed that your account army@raja.egothor.org is hosted on raja.egothor.org. You will import a single RSS http://feeds.reuters.com/reuters/technologyNews?format=xml

Note: you may import many RSS feeds to a single account, you may also import various RSS feeds to many Mastodon servers. Core algorithms are based on a linear complexity, so that the import capacity is almost unlimited.

Registration with Mastodon

You must register the import application and receive an access token for the Mastodon account:

$ java -jar egor.jar register -h raja.egothor.org -u army@raja.egothor.org -p TYPE_PASSWORD_HERE
# save as your configuration file for later commands
home          = raja.egothor.org
client_id     = 9**************************************************************3
client_secret = 1**************************************************************c
access_token  = f**************************************************************f
created_at    = 1550501491
scope         = read write follow
token_type    = Bearer

Save this output to import-001.cfg file. It contains an access_token which allows you to post new statuses as army@raja.egothor.org. Keep this file private. If you want to cancel the authorization, log-in as army@raja.egothor.org, open Settings, Authorized applications, and click the cancel link.

Note: you can use this configuration file for many imports with the same account.

Prepare RSS import

Concentrator is an operation that reads several RSS feeds and produces a single output. It can produce an aggregated RSS feed, or (more important for us right now) an input stream for Mastodon importer (operation "toot", see below).

Concetrator will read http://feeds.reuters.com/reuters/technologyNews?format=xml feed by default, if you do not specify another one. It is fine for this short guide. You can run concentrator without any params - it shows pretty-printed RSS feed we are going to process.

For Mastodon, you need short statuses (up to 500 characters) instead of XML. We will limit the status length to 450 characters in this guide. Issue:

$ java -jar egor.jar concentrator -f "TITLE: DESC LINK TAGS #Reuters" -fm 450

For practical reasons, you can easily eliminate log messages (just send STDERR to /dev/null), issue:

$ java -jar egor.jar concentrator -f "TITLE: DESC LINK TAGS #Reuters" -fm 450 2>/dev/null

The output contains one status per one line. You formatted the line with "TITLE: DESC LINK TAGS #Reuters" pattern, so the line always ends with (your) tag "#Reuters" after tags used by RSS feed. Unfortunately, foreign tags are often impractical. In this case, RSS feed uses "#technologyNews", but you may rather replace it with "#TechNews". To do so, use the table of aliases. Prepare a file, e.g. aliases.cfg with the following content:

TechNews technologyNews

and issue:

$ java -jar egor.jar concentrator -f "TITLE: DESC LINK TAGS #Reuters" -fm 450 -fc aliases.cfg

Feed categories (tags) are replaced with "#TechNews" now.

You can save the output of concentrator to a file, or send it through a pipe. We will use the first way, so:

$ java -jar egor.jar concentrator -f "TITLE: DESC LINK TAGS #Reuters" -fm 450 -fc aliases.cfg >import.txt

Import statuses into Mastodon

You have prepared your credentials in import-001.cfg file, and import data in import.txt. Last step is the real import, issue:

$ java -jar egor.jar toot -f import-001.cfg -v public -u hash-imported-001.db < import.txt

The import process saves MD5 hashes of all posted statuses into hash-imported-001.db database. If you execute the same command again, nothing will be imported:

$ java -jar egor.jar toot -f import-001.cfg -v private -u hash-imported-001.db < import.txt
:
:
INFO: Number of statuses added: 0

The MD5 hashes are saved with the timestamp when they were allocated. It allows you to remove obsolete hashes.

For instance, you can keep only hashes which are not older than 7 days, issue:

$ java -jar egor.jar vacuum -u hash-imported-001.db -d 7

It would tell you something like this (depends on the number of items in your RSS feed from Reuters):

INFO: Statuses total: 20, live: 20, to-be-removed: 0, saving...

It means, that all 20 items are new and will not be removed. If you want to see some action, tell the cleaner to eliminate all hashes which are older than 0 days (i.e. NOW):

$ java -jar egor.jar vacuum -u hash-imported-001.db -d 0
INFO: Statuses total: 20, live: 0, to-be-removed: 20, saving...

In this case, all hashes would be removed.

HahaBimbi

Operation

Files

Project Documentation

About

Usage

Registration with Mastodon

Prepare RSS import

Import statuses into Mastodon