Concentrator

In this project, the concentrator operation is a simple aggregator. My other projects use an advance version of this module to analyze and act according to aggregated data/information, so I left the original module name.

$ java -jar egor.jar concentrator -help
usage: egor.core.Concentrator [-a <ID>] [-f <FORMAT_STRING>] [-fc <FILENAME>] [-fm <LENGTH>] [-help] [-i <FILENAME>] [-r <FILENAME>] [-ri] [-u <URL>]
 -a,--user-agent <ID>                          user-agent identification
 -f,--format <FORMAT_STRING>                   TITLE, DESC, LINK, TAGS variables and any text
 -fc,--formatted-category-aliases <FILENAME>   category aliases table
 -fm,--formatted-max-len <LENGTH>              maximum length of one item output after formatting,
                                               default: 450
 -ft,--formatted-max-tags <COUNT>              maximum number of tags extracted from RSS categories,
                                               default: 12
 -help                                         print this message
 -i,--rss-index <FILENAME>                     file with RSS URLs
 -r,--refs-db <FILENAME>                       output file with references to extracted media attachments
 -ri,--extract-img-src                         extract media attachments from img src of item description
 -u,--rss-url <URL>                            input RSS URL

This operation reads several RSS feeds and produces a single output. The output includes a textual representation of RSS in a format that is suitable for Mastodon import.

The output can be prepared on servers with fast internet connectivity, access to Mastodon instance is not required.

--user-agent

Some servers do not return a valid RSS if the User-agent (in HTTP headers) is not a specific or known value. Local government offices often use these tricks, try "Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:65.0) Gecko/20100101 Firefox/65.0" or something like this to solve it.

--format

The format string specifies how your statuses will look like. You can use any text and special variables which are replaced by values from RSS feeds. Double white spaces and control characters are replaced by a single white space.

If this parameter is specified, all --formatted-* parameters are applied as well.

If this parameter is missing, downloaded RSS feeds are aggregated and pretty-printed.

--formatted-category-aliases

The filename defines the table of tag (category) aliases. One line is a group of aliases. Tags of the group are replaced by the first tag on the line. The lookup operation is case-insensitive.

Example:

Hacker HackerOne hackerNews hackMeUp
News RT CNN CBS
SpiesTalking BIS VZ UZSI GRU CIA NSA

Categories like hAckErnews, hackMeup, hackMeUp, hacker, HAcKER are all transformed to #Hacker. Categories like RT, rt, Cnn, CNN, etc are all transformed to #News. Finally, agencies labels (case insensitive) are rewritten to #SpiesTalking.

--formatted-max-tags

It sets the maximum number of unique category names which are accepted from input RSS feed for a single item (status).

--rss-index

Filename with RSS feed URLs, one URL per one line with (optional) extra tags.

--refs-db

This file will contain remote references which were discovered in RSS, and which are also referenced from a stream printed on the standard output.

--extract-img-src

This parameter instructs the program to extract IMG SRC URLs from RSS item description fields. This parameter is not active unless you specify --refs-db.

--rss-url

A single RSS feed URL for processing. If it is not specified and --rss-index is missing as well, some demo URL will be processed.