User's Guide

# Introduction

curn is an RSS reader. It scans a configured set of URLs, each one representing an RSS feed, and summarizes the results. By default, curn keeps track of individual items within each RSS feed, using an on-disk cache; when using the cache, it will suppress displaying information for items it has already processed (though that behavior can be disabled).

Unlike many RSS readers, curn does not use a graphical user interface. It is a command-line utility, intended to be run periodically in the background by a command scheduler such as cron(8) (on UNIX-like systems) or the Windows Scheduler Service (on Windows).

curn can read RSS feeds from any URL that's supported by Java's runtime. When querying HTTP sites, curn uses the HTTP If-Modified-Since and Last-Modified headers to suppress retrieving and processing feeds that haven't changed (though a Force Feed Download Plug In, such as the Retain Articles, can override that capability). By default, it also requests that the remote HTTP server gzip the XML before sending it. (Some HTTP servers honor the request; some don't.) These measures both minimize network bandwidth and ensure that curn is as kind as possible to the remote RSS servers. (There are some additional steps you can take to be more bandwidth-friendly.)

curn comes with a built-in adapter for the ROME feed parser, but it can easily be extended to use any RSS parser. (curn uses ROME by default.) See the ParserClass configuration item for information on how to specify which parser curn should use. See the section entitled Using an Unsupported RSS Parser for more details on adapting curn to use other RSS parsers.

curn supports a several output formats; you can configure one or more output handlers in curn's configuration file. In addition, someone conversant with Java programming or comfortable with a scripting language, such as Python or Ruby, can easily extend curn to handle a new output format. See the section entitled Writing Your Own Output Handler for more details. Finally, as of version 2.6, curn has a built-in template-driven output handler, based on the FreeMarker template engine; The FreeMarkerOutputHandler this handler uses a text template to generate output, so anyone conversant with FreeMarker can easily write his own template to generate custom output. See the section describing the FreeMarkerOutputHandler for more details.

curn's predefined output handlers can generate:

In addition, curn supports emailing its output. If email addresses are specified in the configuration file, then curn creates a MIME multipart/alternative email message [1], using the output of each output handler as one of the alternative attachments. (As of version 3.2, curn can also send individual email messages for each article; see the MailIndividualArticles parameter.)

# Terminology

Throughout this document, the following terms are used:

• curn_home is the curn installation directory.
• user_home is the home directory of the user who's running curn.(On Windows, the curn.bat front-end command script uses the value of the %HOME% variable to override the Java VM's notion of the user's home directory, if %HOME% is set.)

# curn Command Line Syntax

curn is invoked from the command line as follows:

curn

The curn graphical installer automatically creates a Unix shell script (called curn) or a Windows command file (curn.bat) in the bin directory beneath the curn installation directory. You must put the curn bin directory in your path.

Note: While it is possible to invoke curn via the java command, it's not recommended. For curn's plug-ins to work properly, curn must do some fancy class loader footwork. Basically, curn uses a special bootstrap class to find all plug-ins and create a special class loader that can load everything—plug-ins, core code, etc. If you don't invoke curn via the bootstrap class, the plug-ins don't load properly. The curn shell script and command file handle invoking curn so that plug-ins will work properly.

curn's command line uses a UNIX-like syntax. If you invoke curn without any parameters, you get the following usage display.

 Usage: curn [options] config OPTIONS: -B, --build-info Show full build information, then exit. This option shows a bit more information than the --version option. This option can be combined with the --plug-ins option to show the loaded plug-ins. -C, --no-cache Don't use a cache file at all. -e, --config-encoding encoding The encoding to use when reading the configuration file. Default: The default encoding for the Java runtime on the current operating system. --logging Enable logging via Jakarta Commons Logging. -p, --plug-ins Show the list of located plug-ins and output handlers, then exit. This option can be combined with either --build-info or --version to show version information, as well. -t, --time

Many of curn's command-line options simply override settings in the curn configuration file. Each option and argument is discussed in more detail, below.

## Command Line Options

OPTIONS
Short Option Long Option Explanation
-B --build-info Display detailed information about how and when curn was built, then exit without doing anything. Useful primarily when debugging or submitting problem reports. For instance, the command
curn -B
products output similar to the following:
curn, version 3.0 (build 20060608.185936.321)

Build:          20060608.185936.321
Build date:     2006/06/08 14:59:36 EDT
Built by:       bmc on sunball.inside.clapper.org
Built on:       Linux 2.6.16-1.2122_FC5smp (i386)
Build Java VM:  Java HotSpot(TM) Client VM 1.5.0_07-b03 (Sun Microsystems Inc.)
Build compiler: javac
Ant version:    Apache Ant version 1.6.5 compiled on June 2 2005

For a simple one-line version display, use the --version option.

-C --no-cache Run without a cache. Each RSS item curn encounters will appear to be new and will be passed to the output handlers. Also see the CacheFile configuration directive.
-e encoding --config-encoding encoding Specify the encoding of the configuration file. The specified encoding can be any of the encodings supported by the underlying Java virtual machine. If you don't specify an encoding, curn will use the default encoding for the Java virtual machine. On Unix systems in the United States and western Europe, this is usually "ISO-8859-1"; on Windows systems, it is typically "Cp1252".
--logging Enable logging via the java.util.logging API. You will also have to specify a logging configuration file via a -Djava.util.logging.config.file system property. For instance,
java -Djava.util.logging.config.file=/tmp/logging.properties org.clapper.curn.Tool --logging ...
See the section entitled Logging for more details on specifying logging parameters.
-t <time> --time <time>

For the purposes of cache expiration, pretend the current time is <time>, instead of the wall clock time. <time> may be specified in one of the following formats:

2004/07/22 09:37:29 AM
2004/07/22 09:37:29
2004/07/22 09:37 AM
2004/07/22 09:37
2004/07/22 9:37 AM
2004/07/22 9:37
2004/07/22 09 AM
2004/07/22 9 AM
2004/07/22
04/07/22
09:37:29 AM
09:37:29
09:37 AM
09:37
9:37 AM
9:37
09 AM
9 AM

This option is useful primarily for debugging. Before reading the RSS feeds, curn first loads its cache and prunes any cache entries that are out of date. When pruning its cache of out-of-date items, or when loading cache items, curn will behave as if the current time is the specified time.

-u --no-update Load (and prune) the cache file before processing the RSS feeds, but do not save the modified in-memory cache back to disk. Useful primarily for debugging.
-v --version Show just the one-line version information, then exit. For more detailed curn build and version information, use the --build-info option.

## Command Line Parameters

A list of curn's positional parameters follows.

PARAMETERS
Positional Parameter Explanation
config The path or URL to the curn configuration file. This parameter is required.

# The curn Configuration File

curn's configuration file controls all aspects of curn's behavior. The configuration file contains parameters that control curn's behavior, the output handlers, and the individual RSS feed sites. This section first describes the overall configuration file syntax, and then describes each curn configuration item in detail.

You can view a sample curn configuration file by following this link.

## Configuration File Syntax

curn's configuration file is a simple text file. It resembles a standard Java properties file, but it is broken into individual sections, each of which has its own variable namespace. At a glance, the configuration file is reminiscent of a Windows .INI file, but there are quite a few differences. [2].

Like a .INI file, each section in the configuration file consists of a name surrounded by brackets. Each section contains variable assignments; the variable assignment syntax is similar to that of a Java properties file. For example:

 [curn] CacheFile: /home/bmc/.curn/cache DaysToCache: NoLimit ParserClass: org.clapper.curn.parser.rome.RSSParserAdapter ... 

### Section Name Syntax

There can be any amount of whitespace before and after the brackets in a section name; the whitespace is ignored. That is. "[curn]", "[ curn]" and "[ curn ]" all specify a section named "curn".

### Variable Name Syntax

Each section contains zero or more variable settings. Similar to a Java properties file, the variables are specified as name/value pairs, separated by an equals sign ("=") or a colon (":"). Variable names are case-sensitive and may contain any printable character (including white space), other than '$' '{', and '}'. Variable values may contain anything at all. The parser ignores whitespace on either side of the "=" or ":"; that is, leading whitespace in the value is skipped. The way to include leading whitespace in a value is escape the whitespace characters with backslashes. (See below). Variable definitions may span multiple lines; each line to be continued must end with a backslash ("\") character, which escapes the meaning of the newline, causing it to be treated like a space character. The following line is treated as a logical continuation of the first line; however, any leading whitespace is removed from continued lines. For example, the following four variable assignments all have the same value:  [test] a: one two three b: one two three c: one two \ three d: one \ two \ three  Because leading whitespace is skipped, all four variables have the value "one two three". Only variable definition lines may be continued. Section header lines, comment lines (see below) and include directives (see below) cannot span multiple lines. ### Expansions of Variable Values The configuration parser preprocesses each variable's value, expanding embedded metacharacter sequences and substituting variable references. (See below.) You can use backslashes to escape the special characters that the parser uses to recognize metacharacter and variable sequences; you can also use single quotes. See Suppressing Metacharacter Expansion and Variable Substitution, below, for more details. #### Metacharacter Expansion Within a variable's value, Java-style ASCII escape sequences \t, \n, \r, \\, \", \', (a backslash and a space), and \uxxxx are recognized and converted to single characters. Note that metacharacter expansion is performed before variable substitution. #### Variable Substitution A variable's value can interpolate the values of other variables, using a variable substitution syntax reminiscent of the Unix shell (The syntax is also similar to the ant variable substitution syntax). The general form of a variable reference is${sectionName:varName}. sectionName is the name of the section containing the variable to substitute; if omitted, it defaults to the current section. varName is the name of the variable to substitute. If the variable has an empty value, an empty string is substituted. If the variable (or the referenced section) does not exist, the curn will abort. If a variable reference specifies a section name, the referenced section must precede the current section. It is not possible to substitute the value of a variable in a section that occurs later in the file.

The section names "system", "env", and "program" are reserved for special "pseudosections."

The "system" pseudosection is used to interpolate values from Java's System.properties class. For instance, ${system:user.home} substitutes the value of the user.home system property (typically, the home directory of the user running curn). Similarly,${system:user.name} substitutes the user's name.

The "env" pseudosection is used to interpolate values from the environment. On UNIX systems, for instance, ${env:HOME} substitutes user's home directory (and is, therefore, a synonym for${system:user.home}. On some versions of Windows, ${env:USERNAME} will substitute the name of the user running curn. Note: On UNIX systems, environment variable names are typically case-sensitive; for instance,${env:USER} and ${env:user} refer to different environment variables. On Windows systems, environment variable names are typically case-insensitive;${env:USERNAME} and ${env:username} are equivalent. The "program" pseudosection is a placeholder for various special variables provided by the Configuration class at runtime. Those variables are: "program" Section Variable Explanation cwd The program's current working directory. Thus,${program:cwd} will substitute the current working directory, with an appropriate path separator for the host operating system (e.g., "\" for Windows, "/" for UNIX.)
cwd.url The program's current working directory, as a file URL, without the trailing "/". Useful when you need to create a URL reference to something relative to the current directory. This is especially helpful on Windows, where
file://${program:cwd}/something.txt produces an invalid URL, with a mixture of backslashes and forward slashes. By contrast, ${program:cwdURL}/something.txt
always produces a valid URL, regardless of the underlying host operating system.
now The current time, formatted by calling java.util.Date.toString() with the default locale. The program's current working directory. For example, ${program:now} would produce something like "Fri Aug 20 15:18:56 EDT 2004" on a machine with a default English locale. now delim fmt [delim lang delim country]] The current date/time, formatted with the specified java.text.SimpleDateFormat format string. If specified, the given locale and country code will be used; otherwise, the default system locale will be used. lang is a Java language code, such as "en", "fr", etc. country is a 2-letter country code, e.g., "UK", "US", "CA", etc. delim is a user-chosen delimiter that separates the variable name ("now") from the format and the optional locale fields. The delimiter can be anything that doesn't appear in the format string, the variable name, or the locale. For example: ${program:now|yyyy.MM.dd 'at' hh:mm:ss z} 2004.08.20 at 03:26:27 EDT ${program:now/yyyy.MM.dd 'at' HH:mm:ss z/en/US} 2004/08/20 at 15:28:37 EDT${program:now|dd MMM, yyyy HH:mm:ss z|fr|FR} 20 aoät, 2004 at 03:30:29 EDT

Note: SimpleDateFormat requires that literal strings (i.e., strings that should not be processed as part of the format) be enclosed in quotes. For instance:

yyyy.MM.dd 'at' hh:mm:ss z

Because single quotes are special characters in configuration files, it's important to escape them if you use them inside date formats. So, to include the above string in a configuration file's ${program:now} reference, use the following: ${program:now/yyyy.MM.dd \'at\' hh:mm:ss z}

See Suppressing Metacharacter Expansion and Variable Substitution, below, for more details.

For example:

Variable Reference Explanation Sample
${system:user.home} Substitutes the value of the system property "user.home" (usually set to the current user's home directory). [curn] myCurnDir =${system:user.home}/.curn
${curn:myCurnDir} Substitutes the value of variable "myCurnDir" from section the [curn] section. [Feed_Wired] URL: http://www.wired.com/news_drop/netcenter/netcenter.rdf SaveAs:${curn:myCurnDir}/feeds/wired.rdf
${myCurnDir} Substitutes the value of variable "myCurnDir" from the current section. [curn] myCurnDir =${system:user.home}/.curn
CacheFile = ${myCurnDir}/cache The configuration file also supports a simple conditional-substitution logic, which allows you to specify a default value to be substituted if a variable is empty or does not have a value. The general form of a conditional substitution is: ${var?some default value}

If ${var} does not have a value, or has an empty string as its value, the string "some default value" will be substituted. #### Suppressing Metacharacter Expansion and Variable Substitution To prevent the parser from interpreting metacharacter sequences, variable substitutions and other special characters, enclose part or all of the value in single quotes. (See [3] for additional comments.) For example, suppose you want to set variable "prompt" to the literal value "Enter value. To specify a newline, use \n." The following configuration file line will do the trick: prompt: 'Enter value. To specify a newline, use \n' Similarly, to set variable "abc" to the literal string "${foo}" suppressing the parser's attempts to expand "${foo}" as a variable reference, you could use: abc: '${foo}'

To include a literal single quote, you must escape it with a backslash.

### Path Names

Regardless of the underlying operating system, path names in the curn configuration file can always use Unix-style forward slash ("/") characters. At runtime curn will convert the path names to use the appropriate file separator (e.g., "\" on Windows). This capability provides two benefits:

1. It enhances the portability of curn configuration files.
2. It provides a means to avoid using (and, therefore, having to escape) backslash characters in the configuration file.

### Includes

A special include directive permits inline inclusion of another configuration file. The include directive takes two forms:

%include "path"
%include "URL"


For example:

%include "/home/bmc/mytools/common.cfg"
%include "file:///home/bmc/mytools/common.cfg"


The included file may contain any content that is valid for this parser. It may contain just variable definitions (i.e., the contents of a section, without the section header), or it may contain a complete configuration file, with individual sections. Since the parser recognizes a variable syntax that is essentially identical to Java's properties file syntax, it's also legal to include a properties file, provided it's included within a valid section.

Attempting to include a file from itself, either directly or indirectly, will cause curn to abort processing.

A comment line is a one whose first non-whitespace character is a "#" or a "!". This comment syntax is identical to the one supported by a Java properties file. A blank line is a line containing no content, or one containing only whitespace. Blank lines and comments are ignored. For example:

 [curn] # --------------------------------------------------------------------------- # CacheFile: The full path to the file in which curn should cache URLs. # curn uses the cache file to keep track of which URLs it # has already received and displayed, and when it received them. # Under normal operation, curn won't display a URL it has # already displayed and cached. # # This path may contain the ~ metacharacter, to denote the # invoking user's home directory. # # The use of a cache can be disabled by omitting this parameter. # Use the "NoCacheUpdate" parameter to tell curn to read, # but not update, the cache. # # See also: Configuration parameter "NoCacheUpdate" # Command line parameter -C, --nocache # # OPTIONAL. Default: None CacheFile: test.cache 

## Overview of curn's Configuration File

curn's configuration file has three kinds of sections:

• The main section, named "curn", contains global parameters. Some of these parameters can be overridden on the command-line. Others can be overridden on a per-feed basis.
• Output handler sections, each specifying an output handler that will process parsed feed data and (presumably) produce output. There can be any number of these sections.

All other sections in the configuration file are parsed (and subject to syntactic constraints), but otherwise ignored. Thus, it's perfectly legal to have a separate section, e.g., "[var]", where you define variables that exist solely to be substituted into other sections.

Any boolean parameter (i.e., one documented as taking a true or false value) can also take a value of "0" (false), "1" (true), "no" (false) or "yes" (true).

## The [curn] Section

This section contains variable global parameters. Each is described in detail, below. (Parameters marked with plug-in are handled by one of curn's stock plug-ins, rather than by the core code.)

AllowEmbeddedHTML
plug-in
Boolean Default setting for whether or not to allow embedded HTML in certain RSS feed elements, such as description, author, etc. Some RSS formats permit embedded HTML. Setting this parameter to true preserves any embedded HTML markup within a feed; setting this parameter to false causes embedded HTML to be stripped.

Note that certain output handlers will strip HTML regardless of this setting. An output handler that produces text, for instance, is not required to support embedded HTML. This global parameter can be overridden on a per-feed basis.

Notes:
• Use this parameter with care. If supported, the raw HTML is copied directly into the resulting output, without modification. With HTML output, malformed embedded HTML can screw up the resulting HTML document.
No false
CacheFile File name or path name The full path to the file in which curn should cache feed item data. curn uses the cache file to keep track of which feed items it has already received and displayed, and when it received them. Under normal operation, curn won't display a feed item it has already displayed and cached.

The use of a cache can be disabled by omitting this parameter. Use the NoCacheUpdate parameter, or the --no-update command line option, to tell curn to read, but not update, the cache.

The cache file is an XML file. However, since it is generated automatically, you should not edit it.
No None. (If not specified, no cache is used.) NoCacheUpdate
CacheBackup
--no-cache
--no-update
CacheBackup

File name or path name.

The full path to a cache backup file. If this parameter is defined, curn will copy the cache to this backup file before updating the cache on disk.

Warning: This parameter was replaced with TotalCacheBackups in curn version 2.6.

No None. CacheFile
TotalCacheBackups
CommonXMLFixups
plug-in
Boolean Enables or disables the Common XML Fixups plug-in, which attempts to fix common syntax problems in downloaded XML feeds. There is some XML badness that is surprisingly common across feeds, including (but not limited to):
• Using a "naked" ampersand (&) without escaping it.
• Use of nonexistent entities (e.g., &ouml;, &nbsp;)
• Improperly formatted entity escapes
This plug-in attempts to fix those problems.

This global parameter can be overridden on a per-feed basis. This global setting defines the default value for all feeds that don't explicitly set it themselves.
No false The per-feed CommonXMLFixups setting
DaysToCache Positive integer Default maximum number of days to cache an already-read item. This parameter is used when the configuration section for a particular site lacks its own DaysToCache value. Items older than this many days are tossed from the cache when it's read, which means curn forgets that it saw them before. A value of 0 renders the cache is essentially useless (i.e., 0 ensures that curn always forgets items that are cached). The special value "NoLimit" causes curn to leave items in the cache forever. No 365 (days) Per-feed DaysToCache parameter
plug-in
Boolean If set to true, this parameter directs curn to use the "Accept-Encoding: gzip" HTTP header when retrieving an RSS feed from an HTTP server. Since RSS feeds are XML, they typically compress well; retrieving gzipped data, rather than the uncompressed HTML, can save a significant amount of time and network bandwidth. (Note, however, that HTTP servers are not obligated to honor a request to gzip the feed.) This parameter can be overridden on a per-feed basis. This global value sets the default value.

For backward compatibility, this parameter can also be specified as GetGzippedFeeds.
No true
IgnoreArticlesOlderThan
plug-in
String Provides a way to ignore articles that are older than a certain interval. Intervals are expressed in a natural language syntax. For instance:
IgnoreArticlesOlderThan: 3 days
IgnoreArticlesOlderThan: 1 week
IgnoreArticlesOlderThan: 365 days
IgnoreArticlesOlderThan: 12 hours, 30 minutes

Valid interval names (in English) are:
• millisecond, milliseconds, ms
• second, seconds, sec, secs
• minute, minutes, min, mins
• hour, hours, hr, hrs
• day, days
• week, weeks
If you're running curn in a Spanish or French locale, the appropriate Spanish or French equivalents are also available, as well as the English versions.

"year" and "month" are not supported, to avoid the irregularity of leaps years and different month lengths, respectively.

The actual conversion of the strings is done by the org.clapper.util library's Duration class. See that class for more details.

This global value sets the default value.

NOTE: The plug-in that implements this capability uses the timestamp in the XML to determine "older than", not the cached timestamp, because the intent is to weed old articles from a feed that you haven't processed in a while (or perhaps are processing for the first time.) If the article has no timestamp in the XML, it is assumed to be current, i.e., to have a date/time of "now".
No None (i.e., Articles are not ignored based on age) Per-feed IgnoreArticlesOlderThan parameter
MailOutputTo
plug-in
String One or more comma-separated email addresses to receive the output. This parameter is optional. If any email addresses are specified, then curn sends its generated output to those addresses. Depending on the setting of the MailIndividualArticles parameter, curn either sends a single MIME multipart/alternative email with all the output, or it sends one message per article found in the feeds. See MailIndividualArticles for details. No Output is not emailed. SMTPHost
SMTPLocalhost
MailFrom
MailSubject
MailFrom
plug-in
String The email address to use as the sender, when mailing output. The address can be a full RFC 2822-compliant address (e.g., "Joe Blow <joe@example.org>") or just a simple address (e.g., "joe@example.org"). This parameter is only honored when at least one email address is specified via the MailOutputTo configuration parameter. No curn constructs its own "from" address from the user name associated with running process and the current host name. SMTPHost
SMTPLocalhost
MailSubject
MailOutputTo
MailSubject
plug-in
String The subject line to use when mailing output. This parameter is only honored when at least one email address is specified via the MailOutputTo configuration parameter. No curn output SMTPHost
SMTPLocalhost
MailFrom
MailOutputTo
MailIndividualArticles
plug-in
Boolean If set to true, this parameter instructs curn to send an email per article; that is, instead of a single email containing the output from all output handlers, curn will send one individual email for each article. If curn finds 20 unread articles, it'll send 20 email messages, each with a single article; if there are 100 unread articles, curn will send 100 separate email messages. If there are multiple output handlers that actually produce output, then each article email will be a MIME multipart/alternate email containing separate attachments from each output handler for that article.

If this parameter is false or absent, curn will send one email containing the generated output for all feeds and items. If there are multiple output handlers that actually produce output, curn will combine all the outputs into a single MIME multipart/alternative email. Each output handler's output will be a separate multipart/alternative attachment. (curn assumes that each output handler is generating an alternate form of the same information.)

Output handlers that don't generate output are skipped. If none of the configured output handlers generate any output, then curn doesn't send an email message.

This parameter is ignored if no email addresses are specified by the MailOutputTo parameter.

WARNINGS:
• Obviously, if this parameter is true, and there are lots of new articles, curn will send lots of small email messages. Use with caution.
• If the output handler supports a SaveOnly parameter (e.g., the FreeMarkerOutputHandler), and you've set the SaveOnly parameter, the output handler won't generate emailable output. Any output handler that's derived from curn's FileOutputHandler automatically supports SaveOnly.
No Output is not emailed. SMTPHost
SMTPLocalhost
MailFrom
MailSubject
MaxArticlesToShow
plug-in
Integer Sets an upper limit on the number of articles displayed for the feed. This maximum is applied after the articles are sorted (see SortBy) and after the ShowArticlesFor and IgnoreArticlesOlderThan policies are applied. This parameter can be overridden on a per-feed basis. This global parameter sets the default value. No None (i.e., no maximum)
MaxSummarySize
plug-in
Positive integer If an article has a summary, you can optionally set a maximum size for the summary. If a summary exceeds the maximum size, curn will truncate it and add a trailing ellipsis ("...") to indicate the truncation. A value of 0 effectively disables this option. This parameter can be overridden on a per-feed basis. This global parameter sets the default value. No 0 (i.e., no limit on summary size) ReplaceEmptySummaryWith
MaxThreads Positive integer Defines the number of concurrent download threads. If this value is greater than 1, then curn will spawn that many worker threads to handle the downloading and parsing of the RSS feeds concurrently. If this value is 1, curn will process the feeds sequentially. If this value is greater than 1, but less than the total number of feeds, some of the worker threads will end up processing more than one feed (sequentially). Values less than 1 are illegal. No 5
NoCacheUpdate Boolean If set to true (and if a cache file is specified), this parameter tells curn to read the cache file and honor its contents, but not to save the modified in-memory cache back to disk. No false CacheFile
--no-update
ParserClass String The full name of the underlying RSS parser class to be used. This class must implement the org.clapper.curn.parser.RSSParser interface. It can be a first-class parser of its own, or it can be nothing more than an adapter for a third party RSS parser class.

curn comes bundled with one parser:
An adapter class that makes the Rome RSS parser work with curn. (The Rome adapter is only available if the appropriate Rome jar files are in curn's class path. Note also that Rome requires version 1.0 of the JDOM library.)

Any class that implements org.clapper.curn.parser.RSSParser may be used as a value for ParserClass.

Quiet Boolean Normally, if an RSS feed contains no new items, most curn output handlers display the site's name and URL, followed by something like "No new items." Similarly, if curn can't contact a feed site, or if the site's XML is unparseable, curn displays an error message. This option tells curn to silently ignore sites with no data or bad XML. Setting Quiet to true tells curn to suppress both of the above displays. No false --quiet
--no-quiet
ReplaceEmptySummaryWith
plug-in
String Tells curn what to do when the summary for a feed article is missing. Legal values:
• nothing: Leave the summary blank. This is the default.
• content: Replace the summary with the article's content, if there is any content.
• title: Replace the summary with the article's title.
overridden on a per-feed basis. This global value sets the default value.
No nothing Per-feed SortBy parameter
ShowArticlesFor String How long to display show articles from feeds. If specified, this parameter is only used when individual feeds don't specify a ShowArticlesFor parameter if their own. The value is a time interval, expressed using the same natural language strings supported by the IgnoreArticlesOlderThan parameter. For instance:
ShowArticlesFor: 3 days
ShowArticlesFor: 1 week
ShowArticlesFor: 365 days
ShowArticlesFor: 12 hours, 30 minutes

Valid interval names (in English) are:
• millisecond, milliseconds, ms
• second, seconds, sec, secs
• minute, minutes, min, mins
• hour, hours, hr, hrs
• day, days
• week, weeks
If this parameter is not specified, then the default value is to show an article one time only.

NOTE: The plug-in that implements this capability uses the timestamp in the curn cache when aging an article, not the timestamp in the feed's XML. That's because the intent of this configuration parameter is to permit you to keep showing an article for a certain amount of time after the article was first displayed. The article timestamp in the XML is the time that the article was published, not the time that curn first displayed it. The time in the curn cache represents the time that curn first saw (and presumably displayed) the article.

WARNINGS:
• Specifying this parameter forces feeds to be downloaded, even if they haven't changed. curn does not keep cached copies of feed data; the only way it can redisplay an article is to download and re-parse the feed. Also, if the article is no longer in the feed, curn can't redisplay the article even if the elapsed time hasn't yet passed.
• Beware of interactions with the IgnoreArticlesOlderThan parameter. Here's a simple example. Assume the configuration settings are:
IgnoreArticlesOlderThan: 5 days
ShowArticlesFor: 2 days
In this case, any article in the feed that's older than 5 days will be discarded by the Ignore Old Articles plug-in, which will run first. Now, assume there are 4 articles:
• Article 1 has never been processed by curn (i.e., isn't in the cache), and it has an XML timestamp of 3 days ago.
• Article 2 has been processed by curn, 3 days ago. It also has an XML timestamp of 3 days ago.
• Article 3 has been processed by curn, 1 hour ago. It has no XML timestamp.
• Article 4 has been processed by curn, 3 days ago. It has an XML timestamp of 6 days ago.
Now let's see what happens when the plug-ins run. The Ignore Old Articles plug-in runs first. (It sorts higher in the plug-in list. You'll have to take my word for that, or look at the source code.)
• The Ignore Old Articles plug-in keeps Article 1, because its XML timestamp is 3 days ago, which is newer than the 5-day cutoff.
• Ditto for Article 2.
• Article 3 has no XML timestamp, so Ignore Old Articles assumes that it's current and keeps it.
• Article 4 has an XML timestamp of 6 days ago, which is past the 5-day cut-off, so Ignore Old Articles discards it.
At this point, there are three articles left. The Retain Articles PlugIn runs. (That's the plug-in that handles the ShowArticlesFor parameter.)
• The Retain Articles plug-in keeps Article 1, because curn has never seen Article 1 before (i.e., it isn't in the cache).
• The Retain Articles plug-in discards Article 2, because curn first displayed the article 3 days ago, which is past the ShowArticlesFor cut-off of 2 days.
• The Retain Articles plug-in keeps Article 1, because curn first processed it an hour ago, so it's under the 2-day threshold.
In the end, two articles are left.
No 1 millisecond (i.e., show each article once) Per-feed ShowArticlesFor parameter
ShowAuthors
plug-in
Boolean If set to true, this configuration item instructs curn to display author version for each feed item, if available. This global value can be overridden on a per-feed basis. No false
ShowDates
plug-in
Boolean Some RSS feeds or the individual items within each feed contain dates (usually corresponding to the publication dates for the feed or item). If this option is set to true, then curn will display the date for each item that provides a date. This global value can be overridden on a per-feed basis. No false
SummaryOnly
plug-in
Boolean Some RSS feeds provide a description for each item, in addition to the (brief) title. Setting SummaryOnly to true suppresses display of the description. This parameter can be overridden on a per-feed basis. This global value sets the default value.

WARNING: This parameter is deprecated. Use the ReplaceEmptySummaryWith parameter, instead.
No false ReplaceEmptySummaryWith
SMTPHost
plug-in
String The SMTP host to use when mailing output. This parameter is only honored when at least one email address is specified via the MailOutputTo configuration parameter. No localhost per-feed ReplaceEmptySummaryWith parameter
SMTPLocalhost
plug-in
String The name to use to identify the local host when sending email. This parameter is only honored when at least one email address is specified via the MailOutputTo configuration parameter. No The canonical name of the local host per-feed ReplaceEmptySummaryWith parameter
SortBy
plug-in
String Default method to use to sort items within each feed. This parameter is used when the configuration section for a particular site lacks its own SortBy value. Legal values:
• time: Sort by timestamp, if present. Current time is assumed for items that don't have timestamps.
• title: Sort by item title, if present. Any item without a title is sorted as if its title were the empty string ("").
• none: Don't sort (i.e., leave items in the order they appear in the XML).
No none Per-feed SortBy parameter
TotalCacheBackups Positive integer The total number of cache backup copies to keep. If this parameter is greater than 0, then curn will keep that many numbered backups of the cache. If the cache exists when curn attempts to update it, curn will copy the existing cache to cacheFile.0. If cacheFile.0 exists, it will be moved to cacheFile.1 first, and so on down the line, until the maximum number of cache backup files exists. The newest cache is always the one without a numeric extension. the oldest file is the one with the largest numeric extension. This parameter is useful if you want to roll back to a previous cache.

If this parameter is not specified, or is 0, then no cache backups are made.
No 0 CacheFile
UserAgent
plug-in
String Specifies the default HTTP User-Agent header to use. This configuration parameter permits you to have curn masquerade as a known browser, for sites that refuse access to robots and spiders and other unknown web clients. This global value is used when the section for a particular feed does not supply its own UserAgent value. No A string that identifies curn as the user agent. Per-feed UserAgent parameter.
ZipOutputTo
plug-in
String Path to a zip file to receive all output generated by output handlers. No None

The curn configuration file also contains a list of RSS feeds to be polled. Each feed must be specified in its own section in the configuration file. The name of the section must start with the string "Feed". If more than one feed is present, then each section name must also have additional characters, to make the section name unique. The following section names are all valid for RSS feed sections.

• Feed (if there's only one configured feed)
• Feed1
• Feed_0
• Feed_Wired
• Feed.NYTimes.Top

Each feed section supports the following parameters. (Parameters marked with plug-in are handled by one of curn's stock plug-ins, rather than by the core code.)

Variable Argument type Description Required? Default Value
AllowEmbeddedHTML
plug-in
Boolean Whether or not to allow embedded HTML in certain RSS feed elements, such as description, author, etc, for this feed. Some RSS formats permit embedded HTML; setting this parameter to true tells curn output handlers that they should preserve such embedded HTML markup, if possible. If this parameter is false, any embedded HTML is stripped.

Note that certain output handlers will strip HTML regardless of this setting. An output handler that produces text, for instance, is not required to support embedded HTML.

Notes:

• Use this parameter with care. If supported, the raw HTML is copied directly into the resulting output, without modification. With HTML output, malformed embedded HTML can screw up the resulting HTML document.
This parameter overrides the AllowEmbeddedHTML setting in the main configuration section.
No false
ArticleFilter
plug-in
Strings Specifies a set of filters to discard feed item (article) content, based on regular expressions.

The filtering syntax is (shamelessly) adapted from the rawdog RSS reader's article-filter plug-in. A feed filter is configured by adding an ArticleFilter property to the feed's configuration section. The property's value consists of one or more filter command sequences, separated by ";" characters. (The ";" must be surrounded by white space; see below.) Each filter command sequence is of this form:
show|hide [field 'regexp' [field 'regexp' ...]]
field can be one of:
• author: search the author field
• title: search the title field
• summary: search the summary, or description, field
• text: search the full content, if available
• category: search the article's category (or categories)
• any: search all fields
Each regular expression must be enclosed in single quotes. For example:
hide author 'Raymond Luxury-yacht' ; \
show author 'Arthur +.Two-sheds. +Jackson'

If the command is "hide", then the entry will be hidden if the specified field matches the regular expression. If the command is "show", then the entry will be shown if the field matches the regular expression. If there are no fields or regular expressions, then the command is a wildcard match. That is:
hide
is equivalent to:
hide any '.*'
and:
show
is equivalent to:
show any '.*'
Wildcard matches are useful in situations where you want to hide or show "everything but ...". See the examples, below, for details.

All filtering commands are processed, and the end result is what defines whether a given entry is suppressed or not. Regular expressions are matched in a case-blind fashion. The match logic also:
• ignores any embedded newlines in article contents
• (temporarily) strips all HTML from the article text before matching
You can use multiple ArticleFilter parameters per feed, as long as they have unique suffixes (e.g., ArticleFilter1, ArticleFilter2, etc.). All filters are applied to each article to determine whether the article should be filtered out or not.

Examples

Some examples will help clarify the syntax.

For example, the following set of commands hide all articles with the phrase "mash-up" (because mash-ups bore me):
ArticleFilter: hide any 'mash[- \t]?up'

The following, more complicated, entry hides everything by author "Joe Blow", unless the title has the word "rant" in it ('cause his rants are hilarious):
ArticleFilter: hide author '^joe *blow$' ; \ show author '^joe *blow$' title rant

Finally, this example hides everything except articles by Moe Howard:
ArticleFilter: hide ; show author '^moe *howard$'  No Articles are not filtered CommonXMLFixups plug-in Boolean Enables or disables the Common XML Fixups plug-in, which attempts to fix common syntax problems in downloaded XML feeds. Among the corrections this plug-in makes: • Conversion of unescaped ampersand ("&") characters • Conversion of certain commonly seen, but nonexistent, XML entities, such as &mdash; and &ouml; • Conversion of illegal character entities (which are usually leaked unescaped from embedded HTML text) • "Demoronizing" text by converting Microsoft Windows-specific characters (such as smart quotes) to something that will display in any browser. (The term "demoronize" is borrowed from John Walker's demoroniser command-line Unix tool.) This per-feed setting overrides the global default value. No The value of the global CommonXMLFixups parameter in the [curn] section, or false, if that value is not set. DaysToCache Positive integer Maximum number of days to cache an already-read item for this feed. This value locally overrides the global DaysToCache default in the [curn] section. Items older than this many days are tossed from the cache when it's read, which means curn forgets that it saw them before. A value of 0 renders the cache is essentially useless for this feed (i.e., 0 ensures that curn always forgets items that are cached for this feed). The special value "NoLimit" causes curn to leave items in the cache forever. No The value of the global DaysToCache parameter in the [curn] section or 365 if that value is not set. Disabled plug-in Boolean If true, then the feed is skipped. If false, the feed is processed. This variable provides a simple way to disable a feed without having to comment its entire section out. No false EditFeedURL EditItemURL plug-in String Apply the specified regular expression edit to the site's feed URL (EditFeedURL) or to each of the site's RSS item URLs (EditItemURL). The value for this option consists of a Perl 5-style substitution applied to the URL. For example: Remove all the parameters from the URL: 's/?.*$//'

(The PruneURLs parameter provides a simpler mechanism for this common operation.)

Remove a "redirect" CGI from a site whose URLs look like: http://www.example.com/redir.cgi?http://...

s+http://www.example.com/cgi-bin/redir.cgi?++

The substitution syntax supports perl's $1,$2, etc., grouping syntax. However, because the "$" character also introduces a configuration file variable reference, you must escape the "$" to use it in a regular expression. For instance, use either:

s/^([a-z]+)foo(.*)\$/\$1bar\$2/ or 's/^([a-z]+)foo(.*)$/$1bar$2/'

If there are backslashes in the string, you must escape them, as well, preferably by single-quoting the value. See Suppressing Metacharacter Expansion and Variable Substitution for more details. To get the equivalent of Perl 5 expression.

s/^\*.*$// you must specify 's/^\*.*$//'

This substitution syntax supports the following Perl-like modifiers, which are appended to the end of the substitution command:

 g Substitute for all occurrences of the regular expression, not just the first one i Do case-insensitive pattern matching. Case-sensitive pattern matching is the default. m Treat the string is consisting of multiple lines. This modifier changes the meaning of "^" and "$" so that they match just after or just before, respectively, a line terminator or the end of the input sequence. By default these expressions only match at the beginning and the end of the entire input sequence. The modifiers can be concatenated. Thus, 's/abc/xyz/ig' will match and replace all occurrences of the string "abc", whether upper-, lower- or mixed-case. Hint: When logging is enabled, curn will log the parsed expression at the "debug" log level. No None ForceEncoding String Force curn to ignore the character set encoding advertised by the remote server (if any), and use the character set specified by this configuration item, instead. This is useful in the following cases: • the remote HTTP server doesn't supply an HTTP Content-Encoding header, and the local (Java) default encoding doesn't match the document's encoding • the remote HTTP server supplies the wrong encoding • the feed is coming from a file or an FTP server, and the default encoding (see below) isn't correct This value should be a character set encoding that is recognized by the Java runtime environment. ForceCharacterEncoding is a synonym for this parameter, retained for backward compatibility. No • For http and https URLs, the encoding comes from the HTTP Content-Encoding header; if that header isn't present, then the Java VM's default encoding (usually "ISO-8859-1" on UNIX, and "Cp1252" on Windows) is used. • For file URLs, the default encoding is "utf-8", the same as the default value for the SaveAsEncoding parameter. • For all other URL types, the Java VM's default encoding is used. GzipDownload plug-in Boolean If set to true, this parameter directs curn to use the "Accept-Encoding: gzip" HTTP header when retrieving this RSS feed from an HTTP server. Since RSS feeds are XML, they typically compress well; retrieving gzipped data, rather than the uncompressed HTML, can save a significant amount of time and network bandwidth. (Note, however, that HTTP servers are not obligated to honor a request to gzip the feed.) This parameter overrides the global GzipDownload. No true IgnoreArticlesOlderThan plug-in String Provides a way to ignore articles that are older than a certain interval. Intervals are expressed in a natural language syntax. Please see the documentation for the global IgnoreArticlesOlderThan parameter for a more complete description of this parameter. No The default, as defined by the global IgnoreArticlesOlderThan parameter. If no global IgnoreArticlesOlderThan value is set, then articles aren't ignored based on their age. IgnoreDuplicateTitles plug-in Boolean If true, curn will ignore any item whose title matches the title of another item in the feed. It only compares titles within the feed itself; it does not compare against titles of cached items.) Titles are compared without regard to upper or lower case. This feature (hack, really) is useful for sites whose feeds often contain duplicate items (with the same titles) that have different IDs and different URLs, and thus appear to be unique. (Yahoo! News feeds sometimes exhibit this trait.) No false MaxArticlesToShow plug-in Integer Sets an upper limit on the number of articles displayed for the feed. This maximum is applied after the articles are sorted (see SortBy) and after the ShowArticlesFor and IgnoreArticlesOlderThan policies are applied. No The default, as defined by the global MaxArticlesToShow parameter. If no global MaxArticlesToShow value is set, then there is no maximum. MaxSummarySize plug-in Positive integer If an article has a summary, you can optionally set a maximum size for the summary. If a summary exceeds the maximum size, curn will truncate it and add a trailing ellipsis ("...") to indicate the truncation. A value of 0 effectively disables this option. This parameter overrides the global MaxSummarySize parameter. No 0 (i.e., no limit on summary size) PreparseEditsuffix plug-in String A parameter in a Feed section that starts with PreparseEdit (e.g., PreparseEdit1, PreparseEditFoo, etc.) defines a substitution to be applied to the downloaded XML file before it is parsed. As with the EditItemURL and EditFeedURL options, the value for this option this option consists of a Perl 5-style substitution. This capability is rarely needed, but it's sometimes useful for sites that serve unparseable, but easily fixed, XML. (Though the CommonXMLFixups capability covers a lot of these errors with less configuration.) For instance, one news site I read has an RSS channel whose title always contains an unescaped "&". The XML parser will not parse that feed; however, a simple preparse edit command of: 's/ & / \&amp; /g' fixes the problem. (Again, this is one of the common XML syntax errors that CommonXMLFixups will correct.) Another use for PreparseEdit is fixing incorrectly formatted links in the RSS feed. Consider the following <link> element, for fictitious site news.example.com: <link>http://news.example.com&article=12573</link> This is a perfectly parseable URL, but it happens to be wrong. It's missing a "/" between ".com" and "&". It really ought to be: <link>http://news.example.com/&article=12573</link> A quick PreparseEdit rule can fix it, though: PreparseEdit: 's|(news.example.com)([^/]+)|$1/$2| Note the use of a different delimiter in the edit command ("|", instead of "/"). Any non-alphabetic character will work. Multiple instances of this parameter are permitted, as long as each instance's name begins with the string "PreparseEdit" and contains a unique suffix. The substitution syntax supports perl-style$1, $2, etc., grouping syntax. However, because the "$" character also introduces a configuration file variable reference, you must escape the "$" to use it in a regular expression. For instance, use either: s/^([a-z]+)foo(.*)\$/\$1bar\$2/

or

's/^([a-z]+)foo(.*)$/$1bar$2/' If there are backslashes in the string, you must escape them, as well, preferably by single-quoting the value. See Suppressing Metacharacter Expansion and Variable Substitution for more details. To get the equivalent of Perl 5 expression. s/^\*.*$//

you must specify

's/^\*.*$//' This substitution syntax supports the following perl-like modifiers, which are appended to the end of the substitution command:  g Substitute for all occurrences of the regular expression, not just the first one i Do case-insensitive pattern matching. Case-sensitive pattern matching is the default. m Treat the string is consisting of multiple lines. This modifier changes the meaning of "^" and "$" so that they match just after or just before, respectively, a line terminator or the end of the input sequence. By default these expressions only match at the beginning and the end of the entire input sequence.

The modifiers can be concatenated. Thus,

's/abc/xyz/ig'

will match and replace all occurrences of the string "abc", whether upper-, lower- or mixed-case.

Hint: When logging is enabled, curn will log the parsed expression at the "debug" log level.

No None
plug-in
[options] Path If set, this parameter specifies that the original, unparsed feed should be pruned to contain only new items, then written back out to the specified file. This approach differs from that of SaveAsRSS in that it operates on the raw, unparsed feed data; SaveAsRSS, by contrast, regenerates its RSS output from the parsed RSS feed data. As a result, SaveAsRSS will sometimes lose non-standard RSS XML markup. PruneOriginalRSSTo is less likely to do that, since it operates at an XML level, not an RSS level.

This configuration item takes a command line-style value:
PruneOriginalRSSTo: [--backups total_backups] [--encoding encoding] path
or
PruneOriginalRSSTo: [-b total_backups] [-e encoding] path
The parameters have the following meanings:
• total_backups specifies how many backups (i.e., previous versions) of the generated RSS file to keep. For instance, a value of 5 means "keep 5 previous versions of the file, plus the one from the current run." This is the best way to keep RSS files from previous curn runs. The backup files have version numbers preceding their extensions. For instance, if the output file is foo.xml, and total_backups is 2, curn will keep foo.0.xml and foo.1.xml. The file with the largest version number is the oldest one. If not specified, this parameter defaults to 0, which means "no backups".

• encoding is optional and specifies the desired encoding of the file. It defaults to "utf-8".

• path is the path to the file where the RSS output should be written.
No None
plug-in
Boolean If set, and if PruneOriginalRSSTo is also set, then the feed will be downloaded and parsed, and the pruned RSS output will be generated, but the feed will not be passed to any output handlers (or, for that matter, any other plug-ins). No false
PruneURLs
plug-in
Boolean Specifies that all URLs should be pruned of their HTTP parameters. This action can also be accomplished with EditItemURL and EditFeedURL directives; PruneURLs is convenient shorthand for a common operation. No None
ReplaceEmptySummaryWith
plug-in
String Tells curn what to do when the summary for a feed article is missing. Legal values:
• nothing: Leave the summary blank. This is the default.
• content: Replace the summary with the article's content, if there is any content.
• title: Replace the summary with the article's title.
This per-feed setting overrides the global setting.
No nothing
SaveAs
plug-in
[options] Path If set, this parameter specifies the path to a file where curn should save the raw XML contents of the feed, whenever it downloads the feed. This can be useful if you have a master version of curn that downloads a bunch of feeds, with multiple slave versions of curn that then run against the downloaded files. (See Being Bandwidth Friendly for a more detailed discussion of this tactic.)

This configuration item takes a command line-style value:
SaveAs: [--backups total_backups] [--encoding encoding] path
or
SaveAs: [-b total_backups] [-e encoding] path
The parameters have the following meanings:
• total_backups specifies how many backups (i.e., previous versions) of the generated RSS file to keep. For instance, a value of 5 means "keep 5 previous versions of the file, plus the one from the current run." This is the best way to keep RSS files from previous curn runs. The backup files have version numbers preceding their extensions. For instance, if the output file is foo.xml, and total_backups is 2, curn will keep foo.0.xml and foo.1.xml. The file with the largest version number is the oldest one. If not specified, this parameter defaults to 0, which means "no backups".

• encoding is optional and specifies the desired encoding of the file. It defaults to "utf-8".

• path is the path to the file where the raw RSS data should be written.
Note: Often, curn can't tell whether there's any new data in a feed without downloading it. (This is true, for instance, if the remote HTTP server doesn't supply a valid Last-Modified header, or if it doesn't honor the If-Modified-Since header.) If curn decides it has to download a feed, and the feed has a configured SaveAs value, the feed will be saved even if curn later decides there's no new data in the feed.
No None
SaveAsEncoding
plug-in
String If set, and if SaveAs parameter is also set, then this parameter specifies the character encoding to use when saving the feed to the file. If SaveAs is not set for the feed, then any SaveAsEncoding parameter is ignored.

WARNING: This parameter is deprecated. Use the --encoding option to the SaveAs parameter, instead.
No "utf-8". Note that this default value is the same as the default value of the ForceEncoding, for file URLs. This makes it easy to have one instance of curn save RSS feeds for other instances to parse.
SaveOnly
plug-in
Boolean If set, and if SaveAs is also set, then the feed will be downloaded and saved, but not parsed and not included in the generated output. This parameter can be useful when Being Bandwidth Friendly. No false
plug-in
[options] Path If set, this parameter specifies that the feed should be rewritten in the specified RSS format and saved to the specified file. This configuration item takes a command line-style value:
SaveAsRSS: [--backups total_backups] [--type rsstype] [--encoding encoding] path
or
SaveAsRSS: [-b total_backups] [-t rsstype] [-e encoding] path
The parameters have the following meanings:
• total_backups specifies how many backups (i.e., previous versions) of the generated RSS file to keep. For instance, a value of 5 means "keep 5 previous versions of the file, plus the one from the current run." This is the best way to keep RSS files from previous curn runs. The backup files have version numbers preceding their extensions. For instance, if the output file is foo.xml, and total_backups is 2, curn will keep foo.0.xml and foo.1.xml. The file with the largest version number is the oldest one. If not specified, this parameter defaults to 0, which means "no backups".

• encoding is optional and specifies the desired encoding of the file. It defaults to "utf-8".

• path is the path to the file where the RSS output should be written.
Note that only the new data in the feed is converted to RSS.
No None
plug-in
Boolean If set, and if SaveAsRSS is also set, then the feed will be downloaded and parsed, and the RSS output will be generated, but the feed will not be passed to any output handlers (or, for that matter, any other plug-ins). No false
SavedBackups Positive integer Number of saved backups to keep. If this value is non-zero, the handler will back the SaveAs file up before overwriting it. Up to SavedBackups total backed-up files will be kept. A value of 0 disables the feature. No 0
ShowArticles
plug-in
String How long to display show articles from the feed. The value is a time interval, expressed using the same natural language strings supported by the IgnoreArticlesOlderThan parameter. Please see the documentation for the global ShowArticlesFor parameter for a more complete description of this parameter.

This value overrides the global ShowArticlesFor parameter.
No The value of the global ShowArticlesFor parameter.
ShowAuthors
plug-in
Boolean If set to true, this configuration item instructs curn to display author version for this feed, if available. This value overrides the global ShowAuthors parameter. No The value of the global ShowAuthors parameter.
ShowDates
plug-in
Boolean If set to true, this configuration item instructs curn to display any dates associated with this feed, if available. This value overrides the global ShowDates parameter. No The value of the global ShowDates parameter.
SortBy
plug-in
String How to sort items in this feed. This value locally overrides the global SortBy parameter in the [curn] section. Legal values:
• time: Sort by timestamp, if present. Current time is assumed for items that don't have timestamps.
• title: Sort by item title, if present. Any item without a title is sorted as if its title were the empty string ("").
• none: Don't sort (i.e., leave items in the order they appear in the XML).
No The value of the global SortBy parameter in the [curn] section.
SummaryOnly
plug-in
Boolean Some RSS feeds provide a description for each item, in addition to the (brief) title. Setting SummaryOnly to true suppresses display of the description. This parameter overrides the global SummaryOnly parameter.

WARNING: This parameter is deprecated. Use the ReplaceEmptySummaryWith parameter, instead.
No The value of the global SummaryOnly parameter.
TitleOverride
plug-in
String Specifies a string to be used as the site's title, instead of the title supplied in the RSS XML. Useful when the real site-supplied title is not suitable. No None
URL String The fully-qualified URL for the feed. For local files, use a "file:" URL. Yes None
UserAgent
plug-in
String Specifies the HTTP User-Agent header to use when retrieving this feed. This local value overrides the global UserAgent parameter in the [curn] section. This configuration parameter permits you to have curn masquerade as a known browser, and it's useful for sites that refuse access to robots and spiders and other unknown web clients. No The value of the global UserAgent parameter in the [curn] section.

## Configuring Output Handlers

### Output Handler Sections

As curn processes each RSS feed, it parses the XML and loads the new items into internal data structures. When it has finished processing the XML, it hands the parsed data structures to one or more output handlers. Output handlers are so called because they generally produce output that's to be displayed or emailed to the user—generally, but not always. An output handler may choose to save its output to a file, but not send the output back to curn; each of the built-in output handlers does exactly that if its SaveAs configuration parameter is set and its SaveOnly configuration parameters is true. Alternatively, the output handler may choose to convert the internal data structures to output that it publishes somewhere (e.g., via a network connection to an HTTP server).

Each output handler is specified in its own section in the configuration file. The name of the section must start with the string "OutputHandler". If more than one output handler is present, then each section name must also have additional characters, to make the section name unique. The following section names are all valid for output handler sections.

• OutputHandler (if there's only one output handler)
• OutputHandlerHTML
• OutputHandlerText
• OutputHandler1
• OutputHandler_asdf

If no OutputHandler sections are present in the configuration file, curn skips the RSS XML parsing phase. (There's not reason to parse the XML if there are no output handlers to process the parsed feed data.) If there are no output handlers, curn may or may not download individual feeds. If a given feed has no SaveAs setting, and there are no output handlers, then curn skips the feed entirely. After all, there's no sense wasting time downloading the feed, if the feed isn't being parsed or saved. However, if the feed does have a SaveAs setting, curn will download and save the XML (assuming it has changed) even if XML parsing is disabled.

All output handler sections take two variables. In addition, individual output handlers can require configuration items of their own. The two variables common to all output handlers are described below.

Variable Argument type Description Required? Default Value
Class String Identifies Java class that implements the output handler. (The class must implement the org.clapper.curn.OutputHandler interface. See Writing Your Own Output Handler for details.) Yes
Disabled Boolean If true, the output handler is skipped. If false, the output handler is processed. This variable provides a simple way to disable an output handler without having to comment its entire section out. No false

There are some output handler examples following the next section.

### Predefined Output Handlers

curn comes bundled with the following built-in output handlers.

#### FreeMarkerOutputHandler

Class
org.clapper.curn.output.freemarker.FreeMarkerOutputHandler
Purpose
Uses the FreeMarker template engine and a configured template to generate output.
##### Using the FreeMarker output handler

The FreeMarkerOutputHandler, introduced in curn version 2.6, is both simple and flexible. It uses the FreeMarker template engine to convert a template to an output file. FreeMarker templates can be used to generate nearly any kind of textual output file, from HTML and XML to simple text. In fact, the HTMLOutputHandler, TextOutputHandler, and SimpleSummaryOutputHandler have been reimplemented to use the FreeMarkerOutputHandler in conjunction with built-in templates that produce the appropriate kind of output.

Variable Argument type Explanation Required? Default value
AllowEmbeddedHTML Boolean Whether or not the specified template supports embedded HTML. If embedded HTML is found within an RSS item, it will be included in the generated output only if (a) this parameter is true, and (b) the AllowEmbeddedHTML parameter for the feed is also true. Otherwise, embedded HTML will be stripped from the item. No false
Encoding String Specify the character encoding to use when writing the output file. No "utf-8"
SaveAs File name or path name Save a copy of the generated HTML to the specified file. The argument is the path to the file. WARNING: The syntax of this parameter is different from the syntax of the SaveAs parameter for a feed. No None (i.e., no copy is saved)
SaveOnly Boolean If true and if SaveAs is defined, then save a copy of the generated HTML, but don't make it available to the user. (i.e., Don't display it on standard output, and don't email it.) No false
ShowCurnInfo Boolean Whether or not to display the curn version, curn configuration file path, and other curn-related information at the bottom of the generated HTML. No true
TemplateFile Two strings Specifies the location of the FreeMarker template file. The location is specified with three parameters:
• the type, which may be file, classpath, url or builtin
• an identifier string
• a MIME type for the generated output. This parameter, if omitted, defaults to "text/plain"

The form of the identifier string depends on the type value.

 builtin The identifier specifies one of the built-in curn FreeMarker templates that are bundled in the curn jar file. There are three legal values: html: the HTML template, which generates HTML output summary: a plain text template that generates a simple text summary text: a template that generates output containing the same information as the HTML template, but in plain text form Examples: TemplateFile: builtin html TemplateFile: builtin summary classpath The identifier must be a relative path to a template file that can be found by searching the directories and jar files in the Java classpath. Examples: TemplateFile: classpath org/clapper/curn/output/freemarker/HTML.ftl TemplateFile: classpath com/example/mycurn/output/text.ftl file The identifier must be the path (relative or absolute) to the template file. Examples: TemplateFile: file 'C:\curn\html.ftl' TemplateFile: file "${system:user.home}\\curn\\text.ftl" TemplateFile: file ../.curn/summary.ftl NOTE: Note the use of single quotes in the first example; they escape the special meaning of the backslash character. If there any chance that the file name will contain white space—either before or after variable substitution—you must enclose it in either single or double quotes. Otherwise, the white space in the file name will confuse curn. (If the value contains a variable substitution or metacharacters, then you must use double quotes, since variable and metacharacter expansion don't occur within single quotes.) So, in the second example, above,${system:user.home}\\curn\\text.ftl is double-quoted, in case the home directory is something like C:\My Documents. url The identifier must be a URL that specifies the location of the FreeMarker template to use. Examples: TemplateFile: url http://localhost/curn-templates/html.ftl TemplateFile: url ftp://ftp.example.com/curn-templates/text.ftl TemplateFile: url "${program:cwd.url}/curn/bin/html.ftl" Note the use of double quotes in the third example. As with the file type, if there any chance that the URL will contain white space—either before or after variable substitution—you must enclose it in either single or double quotes. Otherwise, the white space in the file name will confuse curn. So, in that last example above,${program:cwd:url}/curn/bin/html.ftl might very well expand to file:/C:/Program Files/clapper.org/curn/bin/html.ftl on a Windows system. Double-quoting it is necessary, because of the blank in Program Files.
Yes
Title String If set, this string overrides the title and the topmost heading in the generated HTML. No RSS Feeds
TOCItemThreshold Positive integer The total number of items (not feeds, but individual items) that must be displayed before curn will generate a table of contents header in the HTML. A value of 0 causes curn to generate a table of contents regardless of how many items are displayed. No A very large number, which effectively disables the table of contents entirely.

You can also write your own FreeMarker template, to change the output format. See the subsection entitled Writing Your Own FreeMarker template, in the Extending curn section, below.

#### HTMLOutputHandler

Class
org.clapper.curn.output.html.HTMLOutputHandler
Purpose
Produces an HTML description of the new RSS items. You can find a sample of its output here.
Note
Prior to version 2.6, this output handler used custom code and the XMLC library to generate its HTML output. However, that approach made it difficult to customize the HTML output for individual sites. As of curn, version 2.6, the HTMLOutputHandler is implemented in terms of the FreeMarkerOutputHandler. The HTMLOutputHandler class uses the FreeMarkerOutputHandler's built-in HTML template, so the output from the HTMLOutputHandler class is identical to the output from the FreeMarkerOutputHandler class when the "builtin html" template is used. You can continue to use the HTMLOutputHandler class, but it is no longer maintained. Plus, the FreeMarkerOutputHandler class is more flexible.
Variable Argument type Explanation Required? Default value
HTMLEncoding String Specify the character encoding to use when writing the HTML. The encoding will be stored in an HTML <META> tag, and it will be used by the Java runtime when opening the output file (to ensure proper translation of characters from the in-memory Unicode character set). This parameter is mapped to the FreeMarkerOutputHandler's encoding parameter. No "utf-8"
SaveAs File name or path name Save a copy of the generated HTML to the specified file. The argument No None (i.e., no copy is saved)
SaveOnly Boolean If true and if SaveAs is defined, then save a copy of the generated HTML, but don't make it available to the user. (i.e., Don't display it on standard output, and don't email it.) No false
ShowCurnInfo Boolean Whether or not to display the curn version, curn configuration file path, and other curn-related information at the bottom of the generated HTML. No true
Title String If set, this string overrides the title and the topmost heading in the generated HTML. No RSS Feeds
TOCItemThreshold Positive integer The total number of items (not feeds, but individual items) that must be displayed before curn will generate a table of contents header in the HTML. A value of 0 causes curn to generate a table of contents regardless of how many items are displayed. No A very large number, which effectively disables the table of contents entirely.

#### TextOutputHandler

Class
org.clapper.curn.output.TextOutputHandler
Purpose
Produces a plain text description of the new RSS items. You can find a sample of its output here.
Note
As of curn, version 2.6, this output handler is implemented in terms of the FreeMarkerOutputHandler. It forces use of the FreeMarkerOutputHandler's built-in "text" template. You can continue to use the TextOutputHandler class, but it is no longer maintained. Plus, the FreeMarkerOutputHandler class is more flexible.
Variable Argument type Explanation Required? Default value
SaveAs File name or path name Save a copy of the generated text to the specified file. The argument No None (i.e., no copy is saved)
SaveOnly Boolean If true and if SaveAs is defined, then save a copy of the generated text, but don't make it available to the user. No false
ShowCurnInfo Boolean Whether or not to display the curn version, curn configuration file path, and other curn-related information at the bottom of the generated output. No true

#### SimpleSummaryOutputHandler

Class
org.clapper.curn.output.SimpleSummaryOutputHandler
Purpose
Produces a plain text summary of the new RSS items. For each RSS feed that has new items, this output handler shows the feed's name, its URL, and the number of new items. It does not show the items themselves. You can find a sample of its output here.
Note
As of curn, version 2.6, this output handler is implemented in terms of the FreeMarkerOutputHandler. It forces use of the FreeMarkerOutputHandler's built-in "summary" template. You can continue to use the SimpleSummaryOutputHandler class, but it is no longer maintained. Plus, the FreeMarkerOutputHandler class is more flexible.
Variable Argument type Explanation Required? Default value
SaveAs File name or path name Save a copy of the generated text to the specified file. The argument is a relative or absolute path to the file where the feed's XML should be saved. No None (i.e., no copy is saved)
SaveOnly Boolean If true and if SaveAs is defined, then save a copy of the generated text, but don't make it available to the user. No false
Message String Static text that is to be included in the output. The text appears right after the heading line and before the actual summary of the RSS feeds. The sample output was created using a Message value that points to a URL, where (presumably) the output from the HTML handler has been saved. See Example 3, below. No Nothing
ShowCurnInfo Boolean Whether or not to display the curn version, curn configuration file path, and other curn-related information at the bottom of the generated output. No true

#### ScriptOutputHandler

Class
org.clapper.curn.script.ScriptOutputHandler
Purpose
Provides an output handler calls a script via the Apache Jakarta Bean Scripting Framework (BSF) or the Java 6 (JSR 223) javax.script scripting framework. (The Java 6 scripting framework is only available if you're running curn via Java 6.) By default, the ScriptOutputHandler first tries to use the javax.script infrastructure; if that doesn't work, it tries to the BSF infrastructure. This handler supports any scripting language supported by the underlying scripting infrastructure. For complete details on writing a script output handler, See Writing a Script Output Handler, below.
Variable Argument type Explanation Required? Default value
Script File name or path name Path to the script to be invoked. The script will be called once, as if from the command line, except that additional global objects will be available via BSF. Yes None
Language String The scripting language, as recognized by BSF. This handler supports all the scripting language engines that are built into the BSF distribution. Of course, the jar files for the scripting languages themselves must be available at runtime, for those languages to be available. (See the section entitled Installing Support Software for details.)

The following values represent some of the languages available for this parameter. The BSF values comes from the Languages.properties file distributed with BSF version 2.3.0. The JSR 223 languages are the set of languages supported by the JSR 223 engines at https://scripting.dev.java.net/, as of the date this document was last updated. Consult that web site for details on available JSR 223 languages.

NOTE: In all cases, except Rhino, the actual script language itself does not come with the scripting infrastructure; you have to download the language separately. The scripting infrastructure merely contains bindings to (a.k.a., engines for) the various supported scripting languages.
Language   curn ScriptOutputHandler Language parameter values for BSF   curn ScriptOutputHandler Language parameter values for JSR 223
AWK (via Jawk)   jawk
BeanShell java beanshell
Groovy groovy * groovy
Jacl (Java TCL) jacl jacl
Javascript, via the Mozilla Rhino engine javascript javascript
JudoScript judoscript judoscript
Jython jython jython
NetRexx netrexx
Pnuts pnuts * pnuts
Ruby (via JRuby) ruby ruby
XSLT Stylesheets (as a component of Apache XML project's Xalan and Xerces) xslt xslt
* Requires third-party engine implementation available from the language web site.
Yes false
SaveAs File name or path name Save a copy of the generated text to the specified file. The argument is a relative or absolute path to the file where the feed's XML should be saved. No None (i.e., no copy is saved)
SaveOnly Boolean If true and if SaveAs is defined, then save a copy of the generated text, but don't make it available to the user. No false
ScriptingAPI: String Specifies which scripting infrastructure to use. Legal values are:

 bsf Use Bean Scripting Framework, and abort if it's not available. javax.script Use JSR 223 Java 6 scripting framework, and abort if it's not available.
No Default behavior: curn first tries to use the JSR 223 (javax.script) infrastructure; if that doesn't work, it tries to use BSF. If neither framework is available, and a ScriptOutputHandler is specified, curn aborts.
ShowCurnInfo Boolean Whether or not to display the curn version, curn configuration file path, and other curn-related information at the bottom of the generated output.

Note: There's no guarantee that a given script will honor this setting.
No true

### Examples

Example 1: The output handler sections from a curn configuration file that produces HTML output. If curn is called with email addresses, the HTML output will be mailed to the specified email addresses. The HTML output is not saved anywhere.

 [OutputHandler] Class: org.clapper.curn.output.html.HTMLOutputHandler Disabled: false

Example 2: The output handler sections from a curn configuration file that produces HTML output and plain text output. If curn is called with email addresses, the text output and the HTML output will be mailed to the specified email addresses as "multipart/alternative" attachments. The output is not saved anywhere.

 [OutputHandlerText] Class: org.clapper.curn.output.TextOutputHandler [OutputHandlerHTML] Class: org.clapper.curn.output.html.HTMLOutputHandler 

Example 3: The output handler sections from a curn configuration file that produces HTML output to a file (but not to the user), and displays (or emails) the user a text summary that contains a link to the HTML file.

 [OutputHandlerSummary] Class: org.clapper.curn.output.SimpleSummaryOutputHandler # Message assumes that generated HTML is available via the web server running # on internal machine "foo", at the specified location. Message: See http://foo/rss/news.html [OutputHandlerHTML] Class: org.clapper.curn.output.html.HTMLOutputHandler # Below path is assumed to correspond to URL http://foo/rss/news.html SaveAs: /usr/local/www/htdocs/rss/news.html SaveOnly: true 

Example 4: The feed and output handler sections from a curn configuration file that retrieves and downloads XML feeds, caching them in a known location, without displaying them. (See Being Bandwidth Friendly for reasons why you might want to do this.)

 [vars] feedDir: ${system:user.home}/.curn/feeds [curn] ... [FeedWired] URL: http://www.wired.com/news_drop/netcenter/netcenter.rdf SaveAs:${vars:feedDir}/wired.rdf [Feed_yahoo_top] URL: http://rss.news.yahoo.com/rss/topstories SaveAs: ${vars:feedDir}/yahoo_top_stories.xml [Feed_cnn_top] URL: http://csociety.purdue.org/~jacoby/XML/CNN_TOP_STORIES.xml SaveAs:${vars:feedDir}/CNN_TOP_STORIES.xml 

# Having curn Run on a Schedule

You can run curn manually, at the command line, whenever you feel like checking your news feeds. However, this is less than useful. The best way to run curn to have your computer's background scheduler process (e.g., cron(8) on UNIX-like systems, or the Windows Scheduler on Windows) run curn for you automatically, every so often.

## Running curn from cron(8) on UNIX-like Systems

I typically run curn three times a day, via cron, on weekdays and once on weekends, and I have it mail me the output. I use my personal crontab, rather than the system-wide /etc/crontab file. Here's a sample crontab entry that does this:

 0 8,12,16 * * 1-5 /usr/local/curn/bin/curn $HOME/.curn/my.cfg 0 16 * * 0,6 /usr/local/curn/bin/curn$HOME/.curn/my.cfg

## Running curn via the Windows Scheduler

Currently, this task is left as an exercise to the reader. (I don't use Windows often enough to play with the scheduler, so I haven't gotten around to running curn that way. When I do find the time and inclination to experiment with running curn from the Windows Scheduler, I'll update this section. Unless, of course, someone else wants to supply the relevant details...)

# Being "Bandwidth Friendly"

When pulling down RSS documents from remote HTTP servers, curn does its best to minimize the amount of bandwidth it consumes. By default, it uses the following strategies to do so (though most can be overridden by configuration parameters and command-line parameters).

• Uses the cache to determine how to set the HTTP If-Modified-Since header, instructing the remote HTTP server to send the RSS document only if it has changed since the last time curn retrieved it. (The HTTP server isn't required to honor that header.)
• Requests that the remote HTTP server compress the RSS data in transit, by setting the HTTP Accept-Encoding header to gzip. Again, the HTTP server isn't required to honor that header; some do, some don't.

But there are other things you can do to be polite to remote HTTP servers.

#### Don't Run curn Too Often

I run curn three times a day. In practice, that's more than sufficient to keep up with the daily news feeds I want to read. Your needs may vary, but if you're using curn to poll remote RSS feeds every five minutes, you probably fall into the "impolite RSS feed user" category.

#### Consolidate Common Feeds

Suppose you have a number of users, all of whom run curn several times a day. Further suppose that there's significant commonality in the RSS feeds that they want to read. Rather than have each user poll the common remote HTTP servers individually, you could run a single instance of curn that downloads and saves those feeds several times a day. You could then instruct the individual users to point their curn configuration files at the local copies of the RSS feeds, instead of the remote ones.

http://www.example.org/rss/nytimes_front_page.xml


You could run curn periodically with the following configuration file, to download each of those feeds without producing any output.

 [var] # "feedDir" dumps to a directory that's accessible internally via URL # http://hub.ourdivision.example.com/rssfeeds/ feedDir: /usr/local/apache/htdocs/rssfeeds # curnDir: where this file and the cache live curnDir: /usr/local/etc/curn [curn] CacheFile: ${var:curnDir}/common.cache MaxThreads: 15 ParserClass: org.clapper.curn.parser.rome.RSSParserAdapter GzipDownload: true ##################################### # No output handlers are configured # ##################################### [Feed_nytimes_front] # New York Times front page. Accessible internally as: # http://hub.ourdivision.example.com/rssfeeds/nytimes_front_page.xml URL: http://www.example.org/rss/nytimes_front_page.xml SaveAs:${var:feedDir}/nytimes_front_page.xml [Feed_bbc_news_world] # BBC World News page. Accessible internally as: # http://hub.ourdivision.example.com/rssfeeds/bbc_world_news.rdf URL: http://www.example.org/rss/bbc_world_news.rdf SaveAs: ${var:feedDir}/bbc_world_news.rdf [Feed_big_jimmy] # Big Jimmy's Blog. (Why is this blowhard so popular?) # Accessible internally as: # http://hub.ourdivision.example.com/rssfeeds/big_jimmys_blog.xml URL: http://www.example.org/rss/big_jimmys_blog.xml SaveAs:${var:feedDir}/big_jimmys_blog.xml 

Note that you could use curn in this manner even if your users are not using curn to read their RSS feeds. You could still run your periodic instance of curn to download the common feeds to a directory that's part of an internal web site, and instruct your users to point whatever RSS readers they're using to those internal web pages, instead of the external ones.

# Extending curn

This section is intended for Java programmers who want to extend curn's capabilities by writing additional output handlers, integrating a different RSS parser, or even writing a new command-line or GUI front-end to curn's main logic.

Roughly speaking, curn's processing is divided into the following phases:

1. startup: Initialization
2. configuration: Read the configuration file (which includes the set of RSS feeds, the output handlers to use, the RSS parser to use, various cache-control values, and other things), load the on-disk cache into memory, and initialize various data structures.
4. feed parsing: Parse the downloaded RSS feeds, assuming (a) a parser class and at least one output handler have been configured. (If there's no parser, or no output handlers, then curn is running in download-only mode, and there's no need to waste time parsing the XML.)
5. output: Pass the parsed feed data to the configured output handlers
6. shutdown: Cleanup.

Since curn permits you to specify the parsing and output handler classes in its configuration file, you can easily extend curn's capabilities by writing your output handler or integrating a different RSS parser. In addition, as of version 3.0, curn supports general-purpose plug-ins that can intercept various phases of curn processing.

## Using an Unsupported RSS Parser

Suppose you've found (or written) an RSS parser that you prefer to use instead of ROME. (Perhaps it's faster than one of those, or perhaps it supports some new RSS syntax that ROME does not support. Or, perhaps you're just playing around.) Integrating that parser with curn requires writing some adapter classes that implement some interfaces and extend some classes provided with the curn software. Those classes are:

Class Abstract Class or Interface? Description
org.clapper.curn.parser.RSSParser Interface Defines a simplified view of an RSS parser. Classes implementing this interface must provide a default public constructor and a parseRSSFeed() method
org.clapper.curn.parser.RSSItem Abstract class Defines a simplified view of an RSS item (a.k.a., one of the items within a parsed feed). The RSSChannel class's getItems() method returns a collection of objects that extend the RSSItem class.

Typically, integrating a new parser means writing a set of adapter classes that implement the above interfaces or extend the above classes, and map the necessary calls onto methods in the real underlying parser. For a sample integration, see the classes in the curn source code package org.clapper.curn.parser.rome. Those classes implement a simple adapter for the third-party ROME RSS parser.

Note: If you've written an adapter for an unsupported RSS parser engine, you'll have to make your adapter classes and the RSS parser classes available to curn. It's not sufficient to add the appropriate jar files to your CLASSPATH environment variable. Please refer to the section entitled Installing Supporting Software for details.

## Plug-ins

As of version 3.0, curn supports plug-ins. curn plug-ins can intercept various phases of curn processing and can enhance or modify curn's behavior. This section discusses curn's plug-in support.

### Overview of Plug-In Support

If invoked properly (e.g., via the curn shell script or the curn.bat DOS script created by the curn installer), curn will search for and load plug-ins before it begins doing its real work. curn looks for plug-ins in the following directories:

• curn_home/plugins
• user_home/curn/plugins
• user_home/.curn/plugins

curn searches all jar files, zip files, and subdirectories in each of those directories, looking for any non-abstract, public class that implements one or more of the curn Java plug-in interfaces. curn then attempts to load and instantiate each plug-in. Once a plug-in has been instantiated, its capabilities are available. Most plug-ins are dormant; that is, they don't do anything unless activated by a plug-in-specific configuration entry.

After curn loads all the plug-ins it can find, it sorts them by "sort key" (a special field that each plug-in is required to provide), case-blind comparison. All plug-ins within a given execution phase are, therefore, invoked in alphabetical order by sort key. This bit of processing trivia is useful if you need to ensure that one plug-in fires before another plug-in.

### Stock Plug-ins

curn ships with a set of stock plug-ins. Some of those plug-ins implement capabilities that were formerly in the curn core code; others provide new functionality. These plug-ins are automatically available (provided you don't delete the curn-plugins.jar file that's shipped with curn). The following table summarizes the stock plug-ins. All stock plug-ins are in the org.clapper.curn.plugins package.

Plug-in Name Class name Explanation Plug-in Configuration Parameters
Allow Embedded HTML AllowEmbeddedHTMLPlugIn Enables or disables embedded HTML in a feed's output. AllowEmbeddedHTML
Article Filter ArticleFilterPlugIn Filter items (articles) from a feed, based on regular expressions. ArticleFilter
Common XML Fixups CommonXMLFixupsPlugIn Fix some common syntax errors in downloaded XML. See the configuration parameters for details on these fixups. CommonXMLFixups
Disable Feed DisableFeedPlugIn Disable a feed, without having to comment it out or remove it from the configuration. Disabled (feed)
Disable Output Handler DisableOutputHandlerPlugIn Disable an output handler, without having to comment it out or remove it from the configuration. Disabled (output handler)
Edit Parsed Feed URL ParsedFeedURLEditPlugIn Edit the parsed feed data, changing the feed URL and/or the individual item (article) URLs. EditFeedURL
EditItemURL
PruneURLs
Email Output EmailOutputPlugIn Email any output created by the output handlers to one or more recipients. SMTPHost
MailOutputTo
MailFrom
MailSubject MailIndividualArticles
Empty Article Summary Empty Article Summary PlugIn How to handle an empty summary in an article. ReplaceEmptySummaryWith
Feed Max Summary Size FeedMaxSummarySizePlugIn Truncate a feed's summary to a maximum number of characters. MaxSummarySize
Feed Summary Only FeedSummaryOnlyPlugIn Optionally strips the full content for a feed, leaving only the summary. SummaryOnly
Ignore Old Articles IgnoreOldArticlesPlugIn Suppress articles in a feed that are older than a specified iterval. IgnoreArticlesOlderThan
Ignore Duplicate Articles IgnoreDuplicateArticlesPlugIn Suppress duplicate articles in a feed, based on a comparison of the article titles. IgnoreDuplicateTitles
Max Articles MaxArticlesPlugIn Limits the number of articles displayed for a feed. MaxArticlesToShow
Override Feed Title TitleOverridePlugIn Overrides the title of a feed. TitleOverride
Raw Feed Edit RawFeedEditPlugIn Apply regular expression edits to a feed's XML before it's parsed. PreparseEdit
Retain Articles RetainArticlesPlugIn Retains already-seen articles for a specified time. ShowArticlesFor
Save As RawFeedSaveAsPlugIn Save a feed's XML to a file. SaveAs
SaveAsEncoding
SaveOnly
Save As RSS SaveAsRSSPlugIn Convert any new data in the feed to a specified RSS format, and save the result to a file. SaveAsRSS
Show Authors ShowAuthorsPlugIn Enable or disable the display of the author(s) of a feed and a feed's articles. ShowAuthors
Show Dates ShowDatesPlugIn Enable or disable the display of the dates for a feed and a feed's articles. ShowDates
Sort Articles SortArticlesPlugIn Control how a feed article's are sorted. SortBy
Zip Output ZipOutputPlugIn Zip any output created by the output handlers into a configured zip file. ZipOutputTo

### Installing Plug-ins

Installing a custom plug-in (i.e., not one of curn's stock plug-ins) is simple: Copy the plug-in's jar or zip file to one of the directories listed in the Overview of Plug-In Support section.

### Writing curn Plug-ins

#### curn Plug-in API

curn automatically invokes plug-ins at various phases of its execution. A plug-in that is registered for a particular phase will be called during that phase of processing. A given plug-in class can be associated with multiple phases of execution. In fact, most are, if only to permit them to intercept their plug-in-specific configuration parameters.

Each plug-in phase is represented by its own Java interface, and each interface has exactly one method. Each plug-in interface, in turn, extends the curn parent PlugIn interface, which defines some additional methods that all plug-ins must provide.

A plug-in that intercepts multiple curn processing phases must implement the interfaces for each of the phases. Here are the plug-in phases, with their associated interfaces and methods, in execution order.

Plug-in interface Plug-in method Description
StartupPlugIn runStartupPlugIn() Called immediately after curn has started, but before it has loaded its configuration file or its cache. Intercepting this phase is useful if a plug-in needs to perform initialization.
MainConfigItemPlugIn runMainConfigItemPlugIn() Called immediately after curn has read and processed a configuration item in the main [curn] configuration section. All configuration items are passed, one by one, to each loaded plug-in. If a plug-in class is not interested in a particular configuration item, its runMainConfigItemPlugIn() method should simply return without doing anything. Note that some configuration items may simply be variable assignment; there's no real way to distinguish a variable assignment from a true configuration item.

A plug-in that wants to provide a configuration item in the main [curn] configuration section must implement this interface.
FeedConfigItemPlugIn runFeedConfigItemPlugIn() Called immediately after curn has read and processed a configuration item in a "Feed" configuration section. All configuration items are passed, one by one, to each loaded plug-in. If a plug-in class is not interested in a particular configuration item, its runFeedConfigItemPlugIn() method should simply return without doing anything. Note that some configuration items may simply be variable assignment; there's no real way to distinguish a variable assignment from a true configuration item.

A plug-in that wants to provide a per-feed configuration item must implement this interface.
OutputHandlerConfigItemPlugIn runOutputHandlerConfigItemPlugIn() Called immediately after curn has read and processed a configuration item in an "OutputHandler" configuration section. All configuration items are passed, one by one, to each loaded plug-in. If a plug-in class is not interested in a particular configuration item, its runOutputHandlerConfigItemPlugIn() method should simply return without doing anything. Note that some configuration items may simply be variable assignment; there's no real way to distinguish a variable assignment from a true configuration item.

A plug-in that wants to provide a per-output handler configuration item must implement this interface.
UnknownSectionConfigItemPlugIn runUnknownSectionConfigItemPlugIn() Called immediately after curn has read and processed a configuration item in an unknown configuration section. All configuration items are passed, one by one, to each loaded plug-in. If a plug-in class is not interested in a particular configuration item, its runUnknownSectionConfigItemPlugIn() method should simply return without doing anything. Note that some configuration items may simply be variable assignment; there's no real way to distinguish a variable assignment from a true configuration item.

A plug-in that requires its own configuration file section must implement this interface.
PostConfigPlugIn runPostConfigPlugIn() Called after the entire configuration has been read and parsed, but before any feeds are processed. Intercepting this event is useful for plug-ins that want to adjust the configuration. For instance:
• The curn command-line wrapper intercepts this plug-in phase so it can adjust the configuration to account for command line options.
• The RawFeedSaveAsPlugIn intercepts this plug-in phase so that it can validate its configuration parameters against each other.
CacheLoadedPlugIn runCacheLoadedPlugIn() Called after the curn cache has been read (and after any expired entries have been purged), but before any feeds are processed.
PreFeedDownloadPlugIn runPreFeedDownloadPlugIn() Called before a feed is downloaded (actually, before a feed is checked to see if it has new data). This method can return false to signal curn that the feed should be skipped. The plug-in method can also set values on the URLConnection used to download the plug-in, via URL.setRequestProperty(). (Note that all URLs, even file: URLs, are passed into this method. Setting a request property on the URLConnection object for a file: URL will have no effect—though it isn't specifically harmful.)

• changing the default User-Agent value
• setting a non-standard HTTP header field
PostFeedDownloadPlugIn runPostFeedDownloadPlugIn() Called immediately after a feed is downloaded. This method can return false to signal curn that the feed should be skipped. For instance, a plug-in that filters on the unparsed XML feed content could use this method to weed out non-matching feeds before they are downloaded.
PostFeedParsePlugIn runPostFeedParsePlugIn() Called immediately after a feed is parsed, but before it is otherwise processed. A post-feed parse plug-in has access to the parsed RSS feed data, via an RSSChannel object. This method can return false to signal curn that the feed should be skipped. For instance, a plug-in that filters on the parsed feed data could use this method to weed out non-matching feeds before they are downloaded. Similarly, a plug-in that edits the parsed data (removing or editing individual items, for instance) could use method to do so.
PostFeedProcessPlugIn runPostFeedProcessPlugIn() Called after a feed is parsed and processed. The plug-in has access to the parsed RSS feed data, via an RSSChannel object. This method can return false to signal curn that the feed should be skipped.
PreFeedOutputPlugIn runPreFeedOutputPlugIn() Called immediately before a parsed feed is passed to an output handler. A pre-feed output plug-in cannot affect the feed's processing. (The time to stop the processing of a feed is in one of the other, preceding phases.) This method will be called multiple times for each feed if there are multiple output handlers.
PostFeedOutputPlugIn runPostFeedOutputPlugIn() Called immediately after a parsed feed is passed to an output handler. A post-feed output plug-in cannot affect the feed's processing. (The time to stop the processing of a feed is in one of the other, preceding phases.) This method will be called multiple times for each feed if there are multiple output handlers.
PostOutputHandlerFlushPlugIn runPostOutputHandlerFlushPlugIn() Called immediately after an output handler is flushed (i.e., after it has been called to process all feeds and its output has been written to a temporary file), but before that output is displayed, emailed, etc.
PostOutputPlugIn runPostOutputPlugIn() Called after curn has flush all output handlers. A post-output plug-in is a useful place to consolidate the output from all output handlers. For instance, such a plug-in might pack all the output into a zip file, or email it. (The EmailOutputPlugIn works exactly this way.)
PreCacheSavePlugIn runPreCacheSavePlugIn() Called right before the curn cache is to be saved. A plug-in might choose to edit the cache at this point.
ShutdownPlugIn runShutdownPlugIn() Called just before curn gets ready to exit. This method allows plug-ins to perform any clean-up they require.

#### Registering the Plug-in with curn

A plug-in class doesn't have to do anything special to register itself with curn. Merely implementing the appropriate interfaces is sufficient, as long as curn can find the plug-in class at run-time.

#### A Simple Example Plug-in

Of course, there's nothing quite like an example to clarify things. So, here's a simple plug-in that:

• defines a ZipOutputTo configuration parameter, in the [curn] section, that specifies the path to a zip file
• takes all output files written by the output handlers and zips them up into the specified zip file

It uses a convenience class, org.clapper.util.io.Zipper, to simplify the chore of writing the zip file. Here's the source (with comments stripped out). Note that this is a stripped-down version of the actual ZipOutputPlugIn.

 import org.clapper.curn.Curn; import org.clapper.curn.CurnConfig; import org.clapper.curn.CurnException; import org.clapper.curn.MainConfigItemPlugIn; import org.clapper.curn.OutputHandler; import org.clapper.curn.PostOutputPlugIn; import org.clapper.util.config.ConfigurationException; import org.clapper.util.logging.Logger; import org.clapper.util.io.Zipper; import org.clapper.util.text.TextUtil; import java.io.File; import java.io.IOException; import java.util.ArrayList; import java.util.Collection; public class ZipOutputPlugIn implements MainConfigItemPlugIn, PostOutputPlugIn { private static final String VAR_ZIP_FILE = "ZipOutputTo"; private File zipFile = null; private static Logger log = new Logger (ZipOutputPlugIn.class); public ZipOutputPlugIn() { } public String getPlugInName() { return "Zip Output"; } public String getPlugInSortKey() { return getPlugInName(); } public void initPlugIn() throws CurnException { } public void runMainConfigItemPlugIn (String sectionName, String paramName, CurnConfig config) throws CurnException { try { if (paramName.equals (VAR_ZIP_FILE)) { String zipFilePath = config.getConfigurationValue (sectionName, paramName); this.zipFile = new File (zipFilePath); } } catch (ConfigurationException ex) { throw new CurnException (ex); } } public void runPostOutputPlugIn (Collection outputHandlers) throws CurnException { if (zipFile != null) { log.debug ("Zipping output to \"" + zipFile.getPath() + "\""); zipOutput (outputHandlers); } } /*----------------------------------------------------------------------*\ Private Methods \*----------------------------------------------------------------------*/ private void zipOutput (Collection outputHandlers) throws CurnException { try { boolean haveFiles = false; // First, figure out whether we have any output or not. for (OutputHandler handler : outputHandlers) { if (handler.hasGeneratedOutput()) { haveFiles = true; break; } } if (! haveFiles) { // None of the handlers produced any output. log.error ("Warning: None of the output handlers " + "produced any zippable output."); } else { // Create the zip file. Zipper zipper = new Zipper (zipFile, /* flatten */ true); for (OutputHandler handler : outputHandlers) { File file = handler.getGeneratedOutput(); if (file != null) { log.debug ("Zipping \"" + file.getPath() + "\""); zipper.put (file); } } zipper.close(); } } catch (IOException ex) { throw new CurnException (ex); } } } 

To activate this plug-in, simply ensure that it is available to curn at startup, and add this configuration directive to the [curn] section of the configuration file:

 # Unix ZipOutputTo: /tmp/curn.zip # Windows #ZipOutputTo: c:\\temp\\curn.zip 

For more examples, please see the source code for the stock plug-ins delivered with curn.

#### Persisting Data from a Plug-in

Beginning with curn 3.1, plug-ins (and output handlers, for that matter) can save runtime metadata to the curn metadata store (formerly called the "cache") and restore that data in a subsequent curn run. To be able to save and restore metadata, a plug-in must:

Once the plug-in is registered as a persistent data client, curn will:

• Ask the plug-in for its name/value pairs to be saved, just before curn exits
• Present the plug-in with name/value pairs to be parsed, just after curn starts up and reads the metadata (cache) file.

The following code fragment shows how a plug-in might register itself.

import org.clapper.curn.DataPersister;
import org.clapper.curn.DataPersisterFactory;
import org.clapper.curn.AbstractPersistentDataClient;

...

public class MyPlugIn extends AbstractPersistentDataClient, ...
{
...
public void initPlugIn() throws CurnException
{
DataPersister dataPersister = DataPersisterFactory.getInstance();
}
...
}


## Customizing curn's Output

### Writing Your Own FreeMarker Template

Because FreeMarker relies on external templates to create the final document, and because curn makes it easy to use custom-crafted templates, you easily create your own custom documents, or your own branded HTML output, without having to write your own output handler. To create your own curn FreeMarker template, you must understand two things:

• How to create a FreeMarker template
• curn's specific FreeMarker data model

A tutorial on FreeMarker is beyond the scope of this document. The remainder of this section assumes you have some familiarity with FreeMarker. If you don't know how FreeMarker works, please consult the on-line FreeMarker documentation before reading this section.

A FreeMarker template relies on the presence of a tree of data, supplied by the program. FreeMarker calls this data tree a "data model". The FreeMarkerOutputHandler creates a curn-specific FreeMarker data model for use within a template. The FreeMarkerOutputHandler is as much as data-mapper as an output handler: It maps curn's internal RSS data structures into a FreeMarker data model, then invokes the FreeMarker template engine to transform the template and data-model into a document.

The curn FreeMarker data model is described below. The data model notation used here is similar to the notation used within the FreeMarker documentation.

 Tree Description (root) | +-- curn | | | +-- showToolInfo ....................... [boolean] whether or not | | to display curn information | | in the output | | | +-- version ............................ [String] version of curn | | | +-- buildID ............................ [String] curn's build ID | +-- totalItems .............................. [int] total items for all channels | +-- dateGenerated ........................... [Date] date generated | +-- extraText ............................... [String] extra text, from the config | +-- encoding ................................ [String] encoding, from the config | +-- tableOfContents ......................... hash of table-of-contents data | | | +-- needed ............................. [boolean] whether a table of contents is needed | | | +-- (channels) ......................... sequence of channel table of contents items | | | +-- channel ...................... table of contents entry for one channel | | | | | +-- title .................. [String] channel title | | | | | +-- totalItems ............. [int] total items in channel | | | | | +-- channelAnchor .......... [String] HTML anchor name for channel | | | +-- channel ... ... | +-- (channels) .............................. sequence of channel (feed) data | | | +-- channel .......................... hash for a single channel (feed) | | | | | +-- index .................... [int] channel's index in list | | | | | +-- totalItems ............... [int] total items in channel | | | | | +-- title .................... [String] channel title | | | | | +-- description .............. [String] channel description, or "" if not available | | | | | +-- anchorName ............... [String] HTML anchor name for channel | | | | | +-- url ...................... [String] channel's URL (as published in the feed's XML) | | | | | +-- configuredURL ............ [String] channel/feed URL (as listed in the curn configuration file) | | | | | +-- id ....................... [String] channel's unique ID (which might just be the URL) | | | | | +-- date ..................... [Date] channel's last-modified date (might be missing) | | | | | +-- rssFormat ................ [String] RSS format of channel (Atom, RSS 0.92, etc.). | | | Empty if not to be shown. | | | | | +-- author ....... [String] the author or authors of the item, combined in a single string, or "" | | | | | +-- (authors) ................ sequence of (String) names of authors of the feed | | | | | +-- (items) .................. sequence of channel items | | | | | +-- item ............... entry for one item | | | | | | | +-- index ........ [int] item's index in channel | | | | | | | +-- title ........ [String] item's title | | | | | | | +-- url .......... [String] item's URL (as published in the feed's XML) | | | | | | | +-- date ......... [Date] the date (might be missing) | | | | | | | +-- author ....... [String] the author or authors of the item, combined in a single string, or "" | | | | | | | +-- authors ...... a sequence of individual (String) author names. Might be empty. | | | | | | | +-- description .. [String] description/summary | | | | | +-- item | | | ... ... ... | | | +-- channel | | ... ... 

The FreeMarkerOutputHandler also places three FreeMarker methods in the data model, as well:

Methods in the Data Model
Method Name Explanation Arguments Examples
wrapText Wraps text at the end of the line, on word boundaries. Uses the org.clapper.util Java Utility Library's WordWrapWriter class.
• stringToWrap: Required. Specifies the string to wrap.
• indentation: Optional. How many blanks to indent the string from the left margin. Each wrapped line, including the first, is indented by this much. Defaults to 0.
• lineLength: Optional. The length of the line. Defaults to 79.
${wrapText (item.title, 4)}${wrapText (item.description, 4, 50)}
indentText Indents the specified string.
• string: Required. Specifies the string to indent.
• indentation: Required. How many blanks to indent the string from the left margin.
${indentText (item.url, 4)}${indentText (channel.url, 8)}
stripHTML Strips all HTML tags from the specified string. Especially useful for plain text templates.
• string: Required. Specifies the string to strip.
${stripHTML (item.description)} escapeHTML Escape special HTML characters in the specified string. For instance, "&" is converted to "&amp;", "<" is converted to "&lt;", etc. • string: Required. Specifies the string to convert.${escapeHTML (item.description)}

Below is a sample template, which is largely identical to the built-in text template. This template illustrates the use of the data model. You can a version of the text template, as well as the HTML template and the simple summary template, by following these links:

Sample FreeMarker template
${title} <#if extraText != "">${wrapText (extraText)}
</#if>

<#list channels as channel>
---------------------------------------------------------------------------
${wrapText (channel.title, 0)}${channel.url}
<#if channel.date?exists>
${channel.date?string("E, dd MMM, yyyy 'at' HH:mm:ss")} </#if> <#list channel.items as item>${wrapText (item.title, 4)}
${indentText (item.url, 4)} <#assign desc = stripHTML(item.description)> <#if desc != "">${wrapText (desc, 8)}
</#if>

</#list>
</#list>

---------------------------------------------------------------------------
<#if (curn.showToolInfo)>
curn, ${curn.version} Generated${dateGenerated?string("EEEEEE, dd MMMM, yyyy 'at' HH:mm:ss zzz")}
</#if>


### Writing Your Own Output Handler

There are two ways to write your own output handler: You can write a Java class that implements the output handler, or you can write a script using a scripting language supported by the Apache Jakarta Bean Scripting Framework (BSF). Both approaches are discussed below.

But—before you write your own output handler, consider whether you can accomplish the same ends by writing a FreeMarker template and using the FreeMarkerOutputHandler. If you're planning to create a different output file format (as opposed to writing an output handler to send data over a network connection or to a database), then there's a good chance that writing a FreeMarker template will be simpler and faster.

#### Writing a Java Output Handler

Writing a new output handler is reasonably straightforward:

To illustrate the concept, let's look at an output handler that simply writes each channel and its items as plain text. (As it turns out, this example is a stripped down version of the existing org.clapper.curn.output.TextOutputHandler class. The real source code has more comments and documentation and also implements a common base class.)

First, let's look at the top of the class and the required init() method:

 import org.clapper.curn.CurnConfig; import org.clapper.curn.CurnException; import org.clapper.curn.ConfigureOutputHandler; import org.clapper.curn.OutputHandler; import org.clapper.curn.FeedInfo; import org.clapper.curn.parser.RSSChannel; import org.clapper.curn.parser.RSSItem; import org.clapper.util.io.WordWrapWriter; import org.clapper.util.text.TextUtil; import org.clapper.util.text.Unicode; import org.clapper.util.misc.Logger; import org.clapper.util.config.ConfigurationException; import org.clapper.util.config.NoSuchSectionException; import java.io.IOException; import java.io.InputStream; import java.io.FileInputStream; import java.io.FileWriter; import java.io.File; import java.io.FileNotFoundException; import java.util.Date; import java.util.ArrayList; import java.util.Collection; import java.util.Iterator; public class MyOutputHandler { private static final String HORIZONTAL_RULE = "---------------------------------------" + "---------------------------------------"; private WordWrapWriter out = null; private CurnConfig config = null; private String message = null; private Collection channels = new ArrayList(); private int totalItems = 0; private File outputFile = null; private boolean saveOnly = false; private static Logger log = new Logger (MyOutputHandler.class); public MyOutputHandler() { } /** * Initializes the output handler for another set of RSS channels. * * @param config the parsed curn configuration data * @param cfgHandler the ConfiguredOutputHandler wrapper object that * contains this object; the wrapper has some useful * metadata, such as the object's configuration section * name and extra variables. * * @throws ConfigurationException configuration error * @throws CurnException some other initialization error */ public void init (CurnConfig config, ConfiguredOutputHandler cfgHandler) throws ConfigurationException, CurnException { String sectionName = cfgHandler.getSectionName(); String saveAs = null; this.config = config; try { if (sectionName != null) { saveAs = config.getOptionalStringValue (sectionName, "SaveAs", null); saveOnly = config.getOptionalBooleanValue (sectionName, "SaveOnly", false); message = config.getOptionalStringValue (sectionName, "Message", null); if (saveOnly && (saveAs == null)) { throw new ConfigurationException (sectionName, "SaveOnly can only be " + "specified if SaveAs " + "is defined."); } } } catch (NoSuchSectionException ex) { throw new ConfigurationException (ex); } if (saveAs != null) outputFile = new File (saveAs); else { try { outputFile = File.createTempFile ("curn", null); outputFile.deleteOnExit(); } catch (IOException ex) { throw new CurnException ("Can't create temporary file."); } } try { log.debug ("Opening output file \"" + outputFile + "\""); out = new WordWrapWriter (new FileWriter (outputFile)); } catch (IOException ex) { throw new CurnException ("Can't open file \"" + outputFile.getPath() + "\" for output", ex); } channels.clear(); totalItems = 0; } 

curn calls the init() method right after instantiating the output handler class. One of the init() method's primary responsibilities is to handle any special handler-specific configuration variables. It does so by:

1. asking the ConfiguredOutputHandler object for the section name that's associated with the output handler class
2. requesting the specific configuration variable values from that section
3. processing the results, if any

The init() method also performs any other initialization required by the output handler class.

Note: The sample init() method, above, does a little more work than it needs to do. As it turns out, the curn API provides a useful abstract base class called org.clapper.curn.output.FileOutputHandler that implements the OutputHandler interface. FileOutputHandler provides an init() method that:

• automatically handles the "SaveAs" and "SaveOnly" configuration parameters
• handles creation of the output file, much the same way the sample init() method, above, does
• provides some useful protected methods for the subclass, such as getOutputFile() (which returns the file to which the subclass should write its output).

FileOutputHandler requires that the subclass provide:

• an initOutputHandler() method, to handle subclass initialization. This method takes the same parameters as the init() method.
• the output methods (e.g., displayChannel()), discussed below

With that in mind, let's simplify our original init() method and class definition. Changes from the original, above, are marked in bold.

 import org.clapper.curn.CurnConfig; import org.clapper.curn.CurnException; import org.clapper.curn.ConfigureOutputHandler; import org.clapper.curn.OutputHandler; import org.clapper.curn.FeedInfo; import org.clapper.curn.output.FileOutputHandler; import org.clapper.curn.parser.RSSChannel; import org.clapper.curn.parser.RSSItem; import org.clapper.util.io.WordWrapWriter; import org.clapper.util.text.TextUtil; import org.clapper.util.text.Unicode; import org.clapper.util.misc.Logger; import org.clapper.util.config.ConfigurationException; import org.clapper.util.config.NoSuchSectionException; import java.io.IOException; import java.io.InputStream; import java.io.FileInputStream; import java.io.FileWriter; import java.io.File; import java.io.FileNotFoundException; import java.util.Date; import java.util.ArrayList; import java.util.Collection; import java.util.Iterator; public class MyOutputHandler extends FileOutputHandler { private static final String HORIZONTAL_RULE = "---------------------------------------" + "---------------------------------------"; private WordWrapWriter out = null; private CurnConfig config = null; private String message = null; private Collection channels = new ArrayList(); private int totalItems = 0; private File outputFile = null; private boolean saveOnly = false; private static Logger log = new Logger (MyOutputHandler.class); public MyOutputHandler() { } /** * Perform any subclass-specific initialization. Subclasses must * override this method. * * @param config the parsed curn configuration data * @param cfgHandler the ConfiguredOutputHandler wrapper * containing this object; the wrapper has some useful * metadata, such as the object's configuration section * name and extra variables. * * @throws ConfigurationException configuration error * @throws CurnException some other initialization error */ public void initOutputHandler (CurnConfig config, ConfiguredOutputHandler cfgHandler) throws ConfigurationException, CurnException { String sectionName = cfgHandler.getSectionName(); this.config = config; try { if (sectionName != null) { // Only need to handle the "Message" parameter. The // FileOutputHandler parent class handles "SaveAs" and // "SaveOnly" message = config.getOptionalStringValue (sectionName, "Message", null); } } catch (NoSuchSectionException ex) { throw new ConfigurationException (ex); } outputFile = super.getOutputFile() try { log.debug ("Opening output file \"" + outputFile + "\""); out = new WordWrapWriter (new FileWriter (outputFile)); } catch (IOException ex) { throw new CurnException ("Can't open file \"" + outputFile.getPath() + "\" for output", ex); } channels.clear(); totalItems = 0; } 

Next, let's look at the output-related methods. There are several output-related methods that are required by the OutputHandler interface:

• displayChannel(): Displays the output for a parsed channel. This method takes the parsed RSSChannel object and a special org.clapper.curn.FeedInfo object that contains the configuration information about the feed (i.e., its URL, its title, etc.).
• flush(): Flushes the output and closes the output files
• getContentType() returns the MIME type associated with the generated output.
• hasGeneratedOutput(): returns true if the handler generated output that the user should see, false otherwise. In this handler's case, hasGeneratedOutput() is provided by the parent FileOutputHandler class; it returns true if the output file has a non-zero length and the SaveOnly configuration parameter has not been set. (Recall that "SaveOnly" says that the generated output should only be saved, not displayed.)
• getGeneratedOutput(): Returns an InputStream for reading the generated output (if hasGeneratedOutput() returns true) or null otherwise. Again, we don't have to implement that method in our sample output handler, because the parent FileOutputHandler provides it for us.

Here's the code for the methods our sample class must implement:

  public void displayChannel (RSSChannel channel, FeedInfo feedInfo) throws CurnException { Collection items = channel.getItems(); indentLevel = setIndent (0); if ((items.size() != 0) || (! config.beQuiet())) { // Emit a site (channel) header. out.println(); out.println (HORIZONTAL_RULE); out.println (convert (channel.getTitle())); out.println (channel.getLink().toString()); Date date = channel.getPublicationDate(); if (date != null) out.println (date.toString()); if (config.showRSSVersion()) out.println ("(Format: " + channel.getRSSFormat() + ")"); } if (items.size() != 0) { // Now, process each item. String s; for (Iterator it = items.iterator(); it.hasNext(); ) { RSSItem item = (RSSItem) it.next(); setIndent (++indentLevel); out.println (); s = item.getTitle(); out.println ((s == null) ? "(No Title)" : convert (s)); s = item.getAuthor(); if (s != null) out.println ("By " + convert (s)); out.println (item.getLink().toString()); Date date = item.getPublicationDate(); if (date != null) out.println (date.toString()); s = item.getSummary(); if (TextUtil.stringIsEmpty (s)) { // Hack for feeds that have no summary but have // content. If the content is small enough, use it // as the summary. s = item.getFirstContentOfType (new String[] { "text/plain", "text/html" }); if (! TextUtil.stringIsEmpty (s)) { s = s.trim(); if (s.length() > CONTENT_AS_SUMMARY_MAXSIZE) s = null; } else { s = s.trim(); } if (s != null) { out.println(); setIndent (++indentLevel); out.println (convert (s)); setIndent (--indentLevel); } } setIndent (--indentLevel); } } else { if (! config.beQuiet()) { setIndent (++indentLevel); out.println (); out.println ("No new items"); setIndent (--indentLevel); } } setIndent (0); } private int setIndent (int level) { StringBuffer buf = new StringBuffer(); for (int i = 0; i < level; i++) buf.append (" "); out.setPrefix (buf.toString()); return level; } public void flush() throws CurnException { out.println (); out.println (HORIZONTAL_RULE); out.println ("curn, version " + Version.VERSION); out.println ("Generated " + new Date().toString()); out.flush(); out = null; } public String getContentType() { return "text/plain"; } 

This particular output handler's displayChannel() method summarizes the channel and item data, writing it to the output file that the init() method opened. The flush() method simply finishes the display.

With this model, it's possible to create output handlers that produce all kinds of output, including (for instance):

• HTML
• XML (e.g., for converting all incoming RSS feeds to one type of RSS feed)
• data sent over a network connection
• etc.

#### Writing a Script Output Handler

Writing a script output handler is even simpler, in a way, than writing a Java output handler. You simply write a script in a supported language, then configure an instance of the ScriptOutputHandler class to point to your script.

##### Supported Scripting Languages

The ScriptOutputHandler uses the Apache Jakarta Bean Scripting Framework (BSF) or the JSR 223 scripting engine to call scripts. It currently supports any scripting language that has a binding to either scripting infrastructure. See the configuration section for the ScriptOutputHandler class for the list of sample languages.

Note: curn comes bundled with a compatible version of the BSF bsf.jar file. If you're running Java 6, and you want JSR 223 support for languages other than Javascript, please see https://scripting.dev.java.net/.

##### Writing the Script

The ScriptOutputHandler class's displayChannel() method doesn't actually generated any output. Instead, it buffers the channels so that the flush() method can invoke the script. That way, the overhead of invoking the script occurs only once.

The ScriptOutputHandler object exposes a special curn object to the invoked script; that object contains the following fields and methods, all of which are available to the script. The curn object is exposed via BSFManager.declareBean(), which means it is a global variable that is automatically accessible to the script, without the need for the script to call any methods to find it.

curn field or method name Corresponding "registered" BSF bean (for backward compatibility) Java type Explanation
curn.channels channels java.util.Collection An Collection of special internal objects that wrap both RSSChannel and FeedInfo objects. The wrapper objects provide two methods:
• getChannel() gets the RSSChannel object
• getFeedInfo() gets the FeedInfo object. This object contains useful metadata about the channel.
curn.outputPath outputPath java.lang.String The path to the output file. The script should write its output to that file. Overwriting the file is fine. If the script generates no output, then it can ignore the file.
curn.config config CurnConfig The org.clapper.curn.CurnConfig object that represents the parsed configuration data. Useful in conjunction with the "configSection" object, to parse additional parameters from the configuration.
curn.configSection configSection java.lang.String The name of the configuration file section in which the output handler was defined. Useful if the script wants to access additional script-specific configuration data.
curn.setMIMEType()     The script should call this method and pass it the MIME type that corresponds to the generated output. If the script generates no output, then it can ignore this method.
mimeType java.lang.PrintWriter A PrintWriter object to which the script should print the MIME type that corresponds to the generated output. If the script generates no output, then it can ignore this object.
curn.logger logger org.clapper.util.misc.Logger A Logger object, useful for logging messages to the curn log file.
version java.lang.String Full curn version string, in case the script wants to include it in the generated output
curn.getVersion()   java.lang.String Method that returns the full curn version string, in case the script wants to include it in the generated output

Here's a sample Jython script that shows how to put it all together. This script reimplements most of the functionality of the org.clapper.curn.output.TextOutputHandler Java class that comes with curn. (Note that the script uses a org.clapper.util.io.WordWrapWriter object for its output. While the word-wrapping functionality could have been implemented directly in Jython, this strategy both saves time and demonstrates how easily you can use existing Java classes from a Jython script.)

Class documentation, copyrights, etc., have been stripped from the script for brevity. You can find the complete script, along with a JRuby implementation of the same functionality, in the curn source bundle, in directory src/org/clapper/curn/output/script.

 import sys from org.clapper.curn import CurnException from org.clapper.util.io import WordWrapWriter HORIZONTAL_RULE = "---------------------------------------" \ + "---------------------------------------" def process_channels(): """ Process the channels passed in through the Bean Scripting Framework. """ # If we didn't care about wrapping the output, we'd just use: # # out = open (self.outputPath, "w") # # But it'd be nice to wrap long summaries on word boundaries at # the end of an 80-character line. For that reason, we use the # Java org.clapper.util.io.WordWrapWriter class. out = WordWrapWriter (open (curn.outputPath, "w")) out.setPrefix ("") msg = curn.config.getOptionalStringValue (curn.configSection, "Message", None) totalNew = 0 # First, count the total number of new items for channel_wrapper in curn.channels: channel = channel_wrapper.getChannel() totalNew = totalNew + channel.getItems().size() if totalNew > 0: # If the config file specifies a message for this handler, # display it. if msg != None: out.println (msg) out.println () # Now, process the items indentation = 0 for channel_wrapper in curn.channels: channel = channel_wrapper.getChannel() channel = channel_wrapper.getChannel() feed_info = channel_wrapper.getFeedInfo() process_channel (out, channel, feed_info, indentation) curn.setMIMEType ("text/plain") # Output a footer indent (out, indentation) out.println () out.println (HORIZONTAL_RULE) out.println (curn.getVersion()) out.flush() def process_channel (out, channel, feed_info, indentation): """ Process all items within a channel. """ curn.logger.debug ("Processing channel \"" + str (channel.getTitle()) + "\"") # Print a channel header indent (out, indentation) out.println (HORIZONTAL_RULE) out.println (channel.getTitle()) out.println (channel.getLinks()[0].toString()) out.println (str (channel.getItems().size()) + " item(s)") date = channel.getPublicationDate() if date != None: out.println (str (date)) if curn.config.showRSSVersion(): out.println ("(Format: " + channel.getRSSFormat() + ")") indentation = indentation + 1 indent (out, indentation) for item in channel.getItems(): # These are RSSItem objects out.println() out.println (item.getTitle()) out.println (str (item.getLinks()[0])) date = item.getPublicationDate(); if date != None: out.println (str (date)) out.println() summary = item.getSummary() if summary != None: indent (out, indentation + 1) out.println (summary) indent (out, indentation) def indent (out, indentation): """ Apply a level of indentation to a WordWrapWriter, by changing the WordWrapWriter's prefix string. out - the org.clapper.util.io.WordWrapWriter indentation - the numeric indentation level """ prefix = "" for i in range (indentation): prefix = prefix + " " out.setPrefix (prefix) # --------------------------------------------------------------------------- process_channels() 
##### Configuring the Script Output Handler

The configuration section for the ScriptOutputHandler class provides a detailed description of the configuration parameters. Here is a sample configuration entry for our TextOutputHandler.py Jython script.

 [OutputHandlerJythonScript] Class: org.clapper.curn.output.script.ScriptOutputHandler #SaveAs: ${system:user.home}/curn/rss-py.txt #SaveOnly: true Language: jython Script:${system:user.home}/curn/TextOutputHandler.py Message: Copy saved in file ${system:user.home}/curn/rss-py.txt  ## Installing Supporting Software As noted in the Overview of Plug-In Support section, curn searches for plug-ins in the following directories: • curn_home/plugins • user_home/curn/plugins • user_home/.curn/plugins curn implicitly adds those directories to the internal class path used by the curn custom class loader. curn also loads the following directories into its class loader: • curn_home/lib • user_home/curn/lib • user_home/.curn/lib If you've written or installed a plug-in, output handler or RSS parser adapter that requires some third-party support software (e.g., a third-party RSS parser engine), or if you want to enable logging using a non-bundled logging framework such as Log4J), you'll have to install the appropriate support jars somewhere where curn can find them. If you install the jars in the lib directory underneath the curn installation directory, then they'll be available to any user who runs that curn installation. However, if you don't have permission to update that directory, or you only want to make your extensions available to you, then you can install your software in the appropriate lib directory under your home directory. As of version 3.0, non-bundled plug-ins must be placed in one of the sanctioned plug-in directories and non-bundled third-party software must be placed in one of the sanctioned lib directories. You cannot simply add the appropriate jars to your CLASSPATH environment variable. To support plug-ins properly, curn's plug-in architecture uses a different Java class loader that does not honor the CLASSPATH setting. # Troubleshooting In the event of problems, your first step should be to enable logging. The next section discusses curn's logging infrastructure. ## Logging curn issues log messages via the Jakarta Commons Logging API (JCL), so it'll log to any JCL-compatible logging framework. The two most popular frameworks are Log4J and the java.util.logging framework that comes with the Java JDK or J2SE runtime. curn's graphical installer automatically installs the JCL jars, but it does not install Log4J (or any other third-party logging framework); so, by default, curn will log via the java.util.logging API. When initialized at runtime, some underlying logging frameworks will automatically begin logging if they find an appropriate configuration file in some default location. To prevent this behavior, curn does not initialize the JCL layer unless the --logging command-line parameter is specified. If --logging is not specified, curn will not issue log messages even if default logging configuration files are present. All curn Java classes are in packages within the org.clapper.curn namespace. Each logging framework has its own initialization files; the following two sections show how you might enable logging for the two more popular logging frameworks, java.util.logging and Log4J. ### java.util.logging Note: Please be aware that, as of curn 3.0, there are some subtle "issues" with JDK logging. If you happened to be using the org.clapper.util.logging.JavaUtilLoggingTextFormatter class to format your output, you're out of luck. In general, you're far better off using Log4J. The gory details, for those who care about such things, follow. curn now uses a special bootstrap mechanism to enable the use of plug-ins. As part of this bootstrap logic, curn installs its own class loader. However, the built-in JDK logging API doesn't play well in that environment. It always uses the system class loader—the CLASSPATH-driven class loader—to find its classes. Normally, this isn't too much of a problem, but curn doesn't rely on CLASSPATH to find its code; instead, it uses its own class loader. Among the jar files curn's class loader searches is the org.clapper.util utility library, the very library that contains the JavaUtilLoggingTextFormatter class. If you try to ensure that the org.clapper.util utility library is in the CLASSPATH (and, therefore, available to the JDK logging API), you run the risk of causing problems with curn's runtime environment. Further, any third-party formatters you are using (such as the SMTPHandler formatter), you have to ensure that the jar files containing those formatters are listed in the classpath. To do that, you'll have to modify the shell script or Windows command file used to invoke curn. If you're not using the JavaUtilLoggingTextFormatter class in the org.clapper.util library, and you're not using any third-party formatters, then this warning probably doesn't apply to you. This section assumes that you're using a properties file to configure the logging framework; if you're using a custom logging configuration class, you'll have to work out the configuration details yourself. When you use the java.util.logging framework with curn, you must specify the location of the logging configuration file with system property java.util.logging.config.file. file. According to the Javadoc for the LogManager class, the property may be set via the Preferences API or as a command-line property definition passed to the java command. To invoke curn on a Unix-like system, so that it logs through the java.util.logging framework, you might use a command line like this: java -Djava.util.logging.config.file=/home/bmc/curn/logging.properties org.clapper.curn.Tool --logging /home/bmc/curn/curn.cfg The curn shell script installed by the graphical installer understands -D parameters; if you use the shell script, you can shorten the above command to: curn -Djava.util.logging.config.file=/home/bmc/curn/logging.properties --logging /home/bmc/curn/curn.cfg Alternatively, you can set those values in the CURN_JAVA_VM_ARGS environment variable, and curn will automatically supply them to the Java virtual machine. For example: export CURN_JAVA_VM_ARGS="-Djava.util.logging.config.file=/home/bmc/curn/logging.properties" curn --logging /home/bmc/.curn/curn.cfg The Windows curn.bat command file currently does not understand -D parameters, so you must use the CURN_JAVA_VM_ARGS environment variable method. For instance: set CURN_JAVA_VM_ARGS=-Djava.util.logging.config.file=/home/bmc/curn/logging.properties curn --logging %HOME%\.curn\curn.cfg Here's a configuration file that writes logs all messages at the "info" level or lower, to a file. (Change "INFO" to "FINEST" to get messages at the debug level.)  handlers=java.util.logging.FileHandler .level=FINEST # %h is replaced with the user's home directory java.util.logging.FileHandler.pattern = %h/curn/log.out java.util.logging.FileHandler.level=FINEST java.util.logging.FileHandler.count = 1 java.util.logging.FileHandler.formatter = java.util.logging.SimpleFormatter org.clapper.curn.level=INFO If you want to have exceptions mailed to you (which can be useful when running curn from a scheduler, such as cron(8)), then download and install the SMTPHandler class from smtphandler.sourceforge.net, and use this configuration file, instead:  handlers=java.util.logging.FileHandler .level=FINEST # %h is replaced with the user's home directory java.util.logging.FileHandler.pattern = %h/curn/log.out java.util.logging.FileHandler.level=FINEST java.util.logging.FileHandler.count = 1 java.util.logging.FileHandler.formatter = java.util.logging.SimpleFormatter smtphandler.SMTPHandler.level=WARNING smtphandler.SMTPHandler.smtpHost=your_smtp_host_here smtphandler.SMTPHandler.to=your_email_address_here smtphandler.SMTPHandler.from=your_email_address_here smtphandler.SMTPHandler.subject=[SMTPHandler] curn error smtphandler.SMTPHandler.bufferSize=4096 smtphandler.SMTPHandler.formatter=smtphandler.SMTPHandler org.clapper.curn.level=INFO Note that both files use the org.clapper.util.misc.JDK14TextLogFormatter formatter class, instead of the JDK-supplied java.util.logging.SimpleFormatter. I don't care for the text format that SimpleFormatter produces, so I use a formatter in from utility library that produces output that's similar to the default Log4J text formatter. Obviously, you can use any formatter you wish, including the java.util.logging.XMLFormatter class. ### Log4J Before you can use the Log4J framework with curn, you must install the log4j.jar file so that curn can find it. (You can download that file from http://logging.apache.org/log4j/.) See the section entitled Installing Support Software for details on where to install log4j.jar. When you use the Log4J framework with curn, you must specify the location of the logging configuration file using system property log4j.configuration Unlike the java.util.logging framework, the argument to log4j.configuration is not a pathname; it's URL. To invoke curn on a Unix-like system, so that it logs through the Log4J framework, you might use a command line like this: java -Dlog4j.configuration=file:///home/bmc/curn/logging.properties org.clapper.curn.Tool --logging /home/bmc/curn/curn.cfg The curn shell script installed by the graphical installer understands -D parameters; if you use the shell script, you can shorten the above command to: curn -Djava.util.logging.config.file=/home/bmc/curn/logging.properties --logging /home/bmc/curn/curn.cfg Alternatively, you can set those values in the CURN_JAVA_VM_ARGS environment variable, and curn will automatically supply them to the Java virtual machine. For example: export CURN_JAVA_VM_ARGS="-Djava.util.logging.config.file=/home/bmc/curn/logging.properties" curn --logging /home/bmc/.curn/curn.cfg The Windows curn.bat command file currently does not understand -D parameters, so you must use the CURN_JAVA_VM_ARGS environment variable method. For instance: set CURN_JAVA_VM_ARGS=-Djava.util.logging.config.file=/home/bmc/curn/logging.properties curn --logging %HOME%\.curn\curn.cfg Here's a configuration file that writes logs all messages at the "info" level or lower, to a file. (Change "info" to "debug" to get messages at the debug level.)  log4j.rootLogger=info, File log4j.appender.File=org.apache.log4j.FileAppender log4j.appender.File.layout=org.apache.log4j.PatternLayout log4j.appender.File.file=${user.home}/curn/log.out # Overwrite the file each time log4j.appender.File.append=false # Print the date in ISO 8601 format log4j.appender.File.layout.ConversionPattern=%d %-5p (%c{1}): %m%n log4j.logger.org.clapper.curn=info

If you want to have exceptions mailed to you (which can be useful when running curn from a scheduler, such as cron(8)), then you have to use the Log4J LevelRangeFilter class to filter the messages going to individual Log4J appenders. You can't use a properties-based configuration file in this case, because Log4J's properties file configurator doesn't support filters. Instead, you must use an XML configuration file, such as the one shown below.

  

Note that both files use the org.clapper.util.misc.JDK14TextLogFormatter formatter class, instead of the JDK-supplied java.util.logging.SimpleFormatter. I don't care for the text format that SimpleFormatter produces, so I use a formatter in from utility library that produces output that's similar to the default Log4J text formatter.

Obviously, you can use any formatter you wish, including the java.util.logging.XMLFormatter class.

# Appendix A: Similar Products and Tools

Adam Sampson's rawdog (RSS Aggregator Without Delusions Of Grandeur) is similar in spirit, features, and invocation. It's written in Python. Like curn, rawdog is intended to be run from a scheduler such as cron.

curn uses the Java Mail API and the Java Beans Activation Framework, which are copyright © Sun Microsystems, Inc.

curn uses the Apache Jakarta Bean Scripting Framework (BSF), the Jakarta Commons Logging API, and the Apache Xerces XML parser API. All are copyright © The Apache Software Foundation.

[3] It's also possible, though hairy, to escape the special meaning of special characters via the backslash character. For instance, you can escape the variable substitution lead-in character, '$', with a backslash. e.g., "\$". This technique is not recommended, however, because you have to double-escape any backslash characters that you want to be preserved literally. For instance, to get "\t", you must specify "\\\\t". To get a literal backslash, specify "\\\\". (Yes, that's four backslashes, just to get a single unescaped one.) This double-escaping is a regrettable side effect of how the configuration file parses variable values: It makes two separate passes over the value, one for metacharacter expansion and another for variable expansion. Each of those passes honors and processes backslash escapes. This problem would go away if the configuration file parser parsed both metacharacter sequences and variable substitutions itself, in one pass. It doesn't currently do that, because I wanted to make use of the existing org.clapper.util.text.XStringBuffer class's decodeMetacharacters() method and the org.clapper.util.text.UnixShellVariableSubstituter class. In general, you're better off just sticking with single quotes. I may eventually fix this problem, but single quotes work now and will continue to work regardless.
$Id$`