curn: Customizable Utilitarian RSS Notifier
User's Guide
curn is an RSS reader. It scans a configured set of URLs, each one representing an RSS feed, and summarizes the results. By default, curn keeps track of individual items within each RSS feed, using an on-disk cache; when using the cache, it will suppress displaying information for items it has already processed (though that behavior can be disabled).
Unlike many RSS readers, curn does not use a graphical user interface. It is a command-line utility, intended to be run periodically in the background by a command scheduler such as cron(8) (on UNIX-like systems) or the Windows Scheduler Service (on Windows).
curn can read RSS feeds from any URL that's supported by Java's runtime. When querying HTTP sites, curn uses the HTTP If-Modified-Since and Last-Modified headers to suppress retrieving and processing feeds that haven't changed (though a Force Feed Download Plug In, such as the Retain Articles, can override that capability). By default, it also requests that the remote HTTP server gzip the XML before sending it. (Some HTTP servers honor the request; some don't.) These measures both minimize network bandwidth and ensure that curn is as kind as possible to the remote RSS servers. (There are some additional steps you can take to be more bandwidth-friendly.)
curn comes with a built-in adapter for the ROME feed parser, but it can easily be extended to use any RSS parser. (curn uses ROME by default.) See the ParserClass configuration item for information on how to specify which parser curn should use. See the section entitled Using an Unsupported RSS Parser for more details on adapting curn to use other RSS parsers.curn supports a several output formats; you can configure one or more output handlers in curn's configuration file. In addition, someone conversant with Java programming or comfortable with a scripting language, such as Python or Ruby, can easily extend curn to handle a new output format. See the section entitled Writing Your Own Output Handler for more details. Finally, as of version 2.6, curn has a built-in template-driven output handler, based on the FreeMarker template engine; The FreeMarkerOutputHandler this handler uses a text template to generate output, so anyone conversant with FreeMarker can easily write his own template to generate custom output. See the section describing the FreeMarkerOutputHandler for more details.
curn's predefined output handlers can generate:
In addition, curn supports emailing its output. If email addresses are specified in the configuration file, then curn creates a MIME multipart/alternative email message [1], using the output of each output handler as one of the alternative attachments. (As of version 3.2, curn can also send individual email messages for each article; see the MailIndividualArticles parameter.)
Throughout this document, the following terms are used:
curn is invoked from the command line as follows:
curn
The curn graphical installer automatically creates a Unix shell script (called curn) or a Windows command file (curn.bat) in the bin directory beneath the curn installation directory. You must put the curn bin directory in your path.
Note: While it is possible to invoke curn via the java command, it's not recommended. For curn's plug-ins to work properly, curn must do some fancy class loader footwork. Basically, curn uses a special bootstrap class to find all plug-ins and create a special class loader that can load everything—plug-ins, core code, etc. If you don't invoke curn via the bootstrap class, the plug-ins don't load properly. The curn shell script and command file handle invoking curn so that plug-ins will work properly.
curn's command line uses a UNIX-like syntax. If you invoke
curn without any parameters, you get the following usage display.
Usage: curn [options] config OPTIONS: -B, --build-info Show full build information, then exit. This option shows a bit more information than the --version option. This option can be combined with the --plug-ins option to show the loaded plug-ins. -C, --no-cache Don't use a cache file at all. -e, --config-encoding encoding The encoding to use when reading the configuration file. Default: The default encoding for the Java runtime on the current operating system. --logging Enable logging via Jakarta Commons Logging. -p, --plug-ins Show the list of located plug-ins and output handlers, then exit. This option can be combined with either --build-info or --version to show version information, as well. -t, --time <time> For the purposes of cache expiration, pretend the current time is <time>. <time> may be in one of the following formats. 2009/10/25 01:14:41 PM 2009/10/25 01:14:41 2009/10/25 01:14 PM 2009/10/25 01:14 2009/10/25 1:14 PM 2009/10/25 1:14 2009/10/25 01 PM 2009/10/25 1 PM 2009/10/25 13:14:41 2009/10/25 13:14 2009/10/25 09/10/25 01:14:41 PM 01:14:41 01:14 PM 01:14 1:14 PM 1:14 01 PM 1 PM 13:14:41 PM 13:14:41 13:14 PM 13:14 -U, --allow-undefined-cfg-vars Don't abort when an undefined variable is encountered in the configuration file; substitute an empty string, instead. Normally, an undefined configuration variable will cause curn to abort. -u, --no-update Read the cache, but don't update it. -v, --version Show version information, then exit. This option can be combined with the --plug-ins option to show the loaded plug-ins. PARAMETERS: config Path or URL to configuration file |
Many of curn's command-line options simply override settings in the curn configuration file. Each option and argument is discussed in more detail, below.
OPTIONS | ||
---|---|---|
Short Option | Long Option | Explanation |
-B | --build-info | Display detailed information about how and when curn
was built, then exit without doing anything. Useful primarily
when debugging or submitting problem reports. For instance,
the command
products output similar to the following:curn -B curn, version 3.0 (build 20060608.185936.321) Build: 20060608.185936.321 Build date: 2006/06/08 14:59:36 EDT Built by: bmc on sunball.inside.clapper.org Built on: Linux 2.6.16-1.2122_FC5smp (i386) Build Java VM: Java HotSpot(TM) Client VM 1.5.0_07-b03 (Sun Microsystems Inc.) Build compiler: javac Ant version: Apache Ant version 1.6.5 compiled on June 2 2005 For a simple one-line version display, use the --version option. |
-C | --no-cache | Run without a cache. Each RSS item curn encounters will appear to be new and will be passed to the output handlers. Also see the CacheFile configuration directive. |
-e encoding | --config-encoding encoding | Specify the encoding of the configuration file. The specified encoding can be any of the encodings supported by the underlying Java virtual machine. If you don't specify an encoding, curn will use the default encoding for the Java virtual machine. On Unix systems in the United States and western Europe, this is usually "ISO-8859-1"; on Windows systems, it is typically "Cp1252". |
--logging | Enable logging via the java.util.logging
API. You will also have to specify a logging configuration file
via a -Djava.util.logging.config.file
system property. For instance,
See the section entitled Logging for more details on specifying logging parameters.java -Djava.util.logging.config.file=/tmp/logging.properties org.clapper.curn.Tool --logging ... |
|
-t <time> | --time <time> | For the purposes of cache expiration, pretend the current time is <time>, instead of the wall clock time. <time> may be specified in one of the following formats:
|
-u | --no-update | Load (and prune) the cache file before processing the RSS feeds, but do not save the modified in-memory cache back to disk. Useful primarily for debugging. |
-v | --version | Show just the one-line version information, then exit. For more detailed curn build and version information, use the --build-info option. |
A list of curn's positional parameters follows.
PARAMETERS | ||
---|---|---|
Positional Parameter | Explanation | |
config | The path or URL to the curn configuration file. This parameter is required. |
curn's configuration file controls all aspects of curn's behavior. The configuration file contains parameters that control curn's behavior, the output handlers, and the individual RSS feed sites. This section first describes the overall configuration file syntax, and then describes each curn configuration item in detail.
You can view a sample curn configuration file by following this link.
curn's configuration file is a simple text file. It resembles a standard Java properties file, but it is broken into individual sections, each of which has its own variable namespace. At a glance, the configuration file is reminiscent of a Windows .INI file, but there are quite a few differences. [2].
Like a .INI file, each section in the configuration file consists of a name surrounded by brackets. Each section contains variable assignments; the variable assignment syntax is similar to that of a Java properties file. For example:
[curn] CacheFile: /home/bmc/.curn/cache DaysToCache: NoLimit ParserClass: org.clapper.curn.parser.rome.RSSParserAdapter ... |
There can be any amount of whitespace before and after the brackets in a section name; the whitespace is ignored. That is. "[curn]", "[ curn]" and "[ curn ]" all specify a section named "curn".
Each section contains zero or more variable settings. Similar to a Java properties file, the variables are specified as name/value pairs, separated by an equals sign ("=") or a colon (":"). Variable names are case-sensitive and may contain any printable character (including white space), other than '$' '{', and '}'. Variable values may contain anything at all. The parser ignores whitespace on either side of the "=" or ":"; that is, leading whitespace in the value is skipped. The way to include leading whitespace in a value is escape the whitespace characters with backslashes. (See below).
Variable definitions may span multiple lines; each line to be continued must end with a backslash ("\") character, which escapes the meaning of the newline, causing it to be treated like a space character. The following line is treated as a logical continuation of the first line; however, any leading whitespace is removed from continued lines. For example, the following four variable assignments all have the same value:
[test] a: one two three b: one two three c: one two \ three d: one \ two \ three |
Because leading whitespace is skipped, all four variables have the value "one two three".
Only variable definition lines may be continued. Section header lines, comment lines (see below) and include directives (see below) cannot span multiple lines.
The configuration parser preprocesses each variable's value, expanding embedded metacharacter sequences and substituting variable references. (See below.) You can use backslashes to escape the special characters that the parser uses to recognize metacharacter and variable sequences; you can also use single quotes. See Suppressing Metacharacter Expansion and Variable Substitution, below, for more details.
Within a variable's value, Java-style ASCII escape sequences \t, \n, \r, \\, \", \', \ (a backslash and a space), and \uxxxx are recognized and converted to single characters. Note that metacharacter expansion is performed before variable substitution.
A variable's value can interpolate the values of other variables, using a variable substitution syntax reminiscent of the Unix shell (The syntax is also similar to the ant variable substitution syntax). The general form of a variable reference is ${sectionName:varName}. sectionName is the name of the section containing the variable to substitute; if omitted, it defaults to the current section. varName is the name of the variable to substitute. If the variable has an empty value, an empty string is substituted. If the variable (or the referenced section) does not exist, the curn will abort. If a variable reference specifies a section name, the referenced section must precede the current section. It is not possible to substitute the value of a variable in a section that occurs later in the file.
The section names "system", "env", and "program" are reserved for special "pseudosections."
The "system" pseudosection is used to interpolate values from Java's System.properties class. For instance, ${system:user.home} substitutes the value of the user.home system property (typically, the home directory of the user running curn). Similarly, ${system:user.name} substitutes the user's name.
The "env" pseudosection is used to interpolate values from the environment. On UNIX systems, for instance, ${env:HOME} substitutes user's home directory (and is, therefore, a synonym for ${system:user.home}. On some versions of Windows, ${env:USERNAME} will substitute the name of the user running curn. Note: On UNIX systems, environment variable names are typically case-sensitive; for instance, ${env:USER} and ${env:user} refer to different environment variables. On Windows systems, environment variable names are typically case-insensitive; ${env:USERNAME} and ${env:username} are equivalent.
The "program" pseudosection is a placeholder for various special variables provided by the Configuration class at runtime. Those variables are:
"program" Section Variable | Explanation | ||||||
---|---|---|---|---|---|---|---|
cwd | The program's current working directory. Thus, ${program:cwd} will substitute the current working directory, with an appropriate path separator for the host operating system (e.g., "\" for Windows, "/" for UNIX.) | ||||||
cwd.url | The program's current working directory, as a
file URL, without the trailing "/".
Useful when you need to create a URL reference to something
relative to the current directory. This is especially helpful
on Windows, where
produces an invalid URL, with a mixture of backslashes and forward slashes. By contrast,file://${program:cwd}/something.txt always produces a valid URL, regardless of the underlying host operating system.${program:cwdURL}/something.txt |
||||||
now | The current time, formatted by calling java.util.Date.toString() with the default locale. The program's current working directory. For example, ${program:now} would produce something like "Fri Aug 20 15:18:56 EDT 2004" on a machine with a default English locale. | ||||||
now delim fmt [delim lang delim country]] |
The current date/time, formatted with the specified
java.text.SimpleDateFormat
format string. If specified, the given locale and country code
will be used; otherwise, the default system locale will be
used. lang is a Java language code, such as "en", "fr",
etc. country is a 2-letter country code, e.g., "UK",
"US", "CA", etc. delim is a user-chosen delimiter that
separates the variable name ("now")
from the format and the optional locale fields. The delimiter
can be anything that doesn't appear in the format string, the
variable name, or the locale. For example:
Note: SimpleDateFormat requires that literal strings (i.e., strings that should not be processed as part of the format) be enclosed in quotes. For instance: yyyy.MM.dd 'at' hh:mm:ss z Because single quotes are special characters in configuration files, it's important to escape them if you use them inside date formats. So, to include the above string in a configuration file's ${program:now} reference, use the following: ${program:now/yyyy.MM.dd \'at\' hh:mm:ss z} See Suppressing Metacharacter Expansion and Variable Substitution, below, for more details. |
For example:
Variable Reference | Explanation | Sample |
---|---|---|
${system:user.home} | Substitutes the value of the system property "user.home" (usually set to the current user's home directory). |
[curn] myCurnDir = ${system:user.home}/.curn |
${curn:myCurnDir} | Substitutes the value of variable "myCurnDir" from section the [curn] section. | [Feed_Wired] URL: http://www.wired.com/news_drop/netcenter/netcenter.rdf SaveAs: ${curn:myCurnDir}/feeds/wired.rdf |
${myCurnDir} | Substitutes the value of variable "myCurnDir" from the current section. | [curn] myCurnDir = ${system:user.home}/.curn CacheFile = ${myCurnDir}/cache |
The configuration file also supports a simple conditional-substitution logic, which allows you to specify a default value to be substituted if a variable is empty or does not have a value. The general form of a conditional substitution is:
If ${var} does not have a value, or has an empty string as its value, the string "some default value" will be substituted.${var?some default value}
To prevent the parser from interpreting metacharacter sequences, variable substitutions and other special characters, enclose part or all of the value in single quotes. (See [3] for additional comments.) For example, suppose you want to set variable "prompt" to the literal value "Enter value. To specify a newline, use \n." The following configuration file line will do the trick:
prompt: 'Enter value. To specify a newline, use \n'
Similarly, to set variable "abc" to the literal string "${foo}" suppressing the parser's attempts to expand "${foo}" as a variable reference, you could use:
abc: '${foo}'
To include a literal single quote, you must escape it with a backslash.
Regardless of the underlying operating system, path names in the curn configuration file can always use Unix-style forward slash ("/") characters. At runtime curn will convert the path names to use the appropriate file separator (e.g., "\" on Windows). This capability provides two benefits:
A special include directive permits inline inclusion of another configuration file. The include directive takes two forms:
%include "path" %include "URL"
For example:
%include "/home/bmc/mytools/common.cfg" %include "file:///home/bmc/mytools/common.cfg"
The included file may contain any content that is valid for this parser. It may contain just variable definitions (i.e., the contents of a section, without the section header), or it may contain a complete configuration file, with individual sections. Since the parser recognizes a variable syntax that is essentially identical to Java's properties file syntax, it's also legal to include a properties file, provided it's included within a valid section.
Attempting to include a file from itself, either directly or indirectly, will cause curn to abort processing.
A comment line is a one whose first non-whitespace character is a "#" or a "!". This comment syntax is identical to the one supported by a Java properties file. A blank line is a line containing no content, or one containing only whitespace. Blank lines and comments are ignored. For example:
[curn] # --------------------------------------------------------------------------- # CacheFile: The full path to the file in which curn should cache URLs. # curn uses the cache file to keep track of which URLs it # has already received and displayed, and when it received them. # Under normal operation, curn won't display a URL it has # already displayed and cached. # # This path may contain the ~ metacharacter, to denote the # invoking user's home directory. # # The use of a cache can be disabled by omitting this parameter. # Use the "NoCacheUpdate" parameter to tell curn to read, # but not update, the cache. # # See also: Configuration parameter "NoCacheUpdate" # Command line parameter -C, --nocache # # OPTIONAL. Default: None CacheFile: test.cache |
curn's configuration file has three kinds of sections:
All other sections in the configuration file are parsed (and subject to syntactic constraints), but otherwise ignored. Thus, it's perfectly legal to have a separate section, e.g., "[var]", where you define variables that exist solely to be substituted into other sections.
Any boolean parameter (i.e., one documented as taking a true or false value) can also take a value of "0" (false), "1" (true), "no" (false) or "yes" (true).
This section contains variable global parameters. Each is described in detail, below. (Parameters marked with plug-in are handled by one of curn's stock plug-ins, rather than by the core code.)
Variable | Argument type | Description | Required? | Default value | See also |
---|---|---|---|---|---|
AllowEmbeddedHTML plug-in |
Boolean | Default setting for whether or not to allow
embedded HTML in certain RSS feed elements, such as description,
author, etc. Some RSS formats permit embedded HTML. Setting this
parameter to true preserves any embedded
HTML markup within a feed; setting this parameter to
false causes embedded HTML to be stripped.
Note that certain output handlers will strip HTML regardless of this setting. An output handler that produces text, for instance, is not required to support embedded HTML. This global parameter can be overridden on a per-feed basis. Notes:
|
No | false | |
CacheFile | File name or path name | The full path to the file in which curn
should cache feed item data. curn uses the cache file to
keep track of which feed items it has already received and
displayed, and when it received them. Under normal operation,
curn won't display a feed item it has already displayed
and cached.
The use of a cache can be disabled by omitting this parameter. Use the NoCacheUpdate parameter, or the --no-update command line option, to tell curn to read, but not update, the cache. The cache file is an XML file. However, since it is generated automatically, you should not edit it. |
No | None. (If not specified, no cache is used.) |
NoCacheUpdate CacheBackup --no-cache --no-update |
CacheBackup | File name or path name. |
The full path to a cache backup file. If this
parameter is defined, curn will copy the cache to this
backup file before updating the cache on disk.
Warning: This parameter was replaced with TotalCacheBackups in curn version 2.6. |
No | None. |
CacheFile TotalCacheBackups |
CommonXMLFixups plug-in |
Boolean | Enables or disables the Common XML Fixups plug-in,
which attempts to fix common syntax problems in downloaded XML feeds.
There is some XML badness that is surprisingly common across feeds,
including (but not limited to):
This global parameter can be overridden on a per-feed basis. This global setting defines the default value for all feeds that don't explicitly set it themselves. |
No | false | The per-feed CommonXMLFixups setting |
DaysToCache | Positive integer | Default maximum number of days to cache an already-read item. This parameter is used when the configuration section for a particular site lacks its own DaysToCache value. Items older than this many days are tossed from the cache when it's read, which means curn forgets that it saw them before. A value of 0 renders the cache is essentially useless (i.e., 0 ensures that curn always forgets items that are cached). The special value "NoLimit" causes curn to leave items in the cache forever. | No | 365 (days) | Per-feed DaysToCache parameter |
GzipDownload plug-in |
Boolean |
If set to true, this parameter directs curn to use the
"Accept-Encoding: gzip"
HTTP header when retrieving an RSS feed from an HTTP server.
Since RSS feeds are XML, they typically compress well;
retrieving gzipped data, rather than the uncompressed HTML, can
save a significant amount of time and network bandwidth. (Note,
however, that HTTP servers are not obligated to honor a request
to gzip the feed.) This parameter can be
overridden on a per-feed basis.
This global value sets the default value.
For backward compatibility, this parameter can also be specified as GetGzippedFeeds. |
No | true | |
IgnoreArticlesOlderThan plug-in |
String | Provides a way to ignore articles that are
older than a certain interval. Intervals are
expressed in a natural language syntax. For
instance:
Valid interval names (in English) are:IgnoreArticlesOlderThan: 3 days IgnoreArticlesOlderThan: 1 week IgnoreArticlesOlderThan: 365 days IgnoreArticlesOlderThan: 12 hours, 30 minutes
"year" and "month" are not supported, to avoid the irregularity of leaps years and different month lengths, respectively. The actual conversion of the strings is done by the org.clapper.util library's Duration class. See that class for more details. This global value sets the default value. NOTE: The plug-in that implements this capability uses the timestamp in the XML to determine "older than", not the cached timestamp, because the intent is to weed old articles from a feed that you haven't processed in a while (or perhaps are processing for the first time.) If the article has no timestamp in the XML, it is assumed to be current, i.e., to have a date/time of "now". |
No | None (i.e., Articles are not ignored based on age) | Per-feed IgnoreArticlesOlderThan parameter |
MailOutputTo plug-in |
String | One or more comma-separated email addresses to receive the output. This parameter is optional. If any email addresses are specified, then curn sends its generated output to those addresses. Depending on the setting of the MailIndividualArticles parameter, curn either sends a single MIME multipart/alternative email with all the output, or it sends one message per article found in the feeds. See MailIndividualArticles for details. | No | Output is not emailed. |
SMTPHost SMTPLocalhost MailFrom MailSubject |
MailFrom plug-in |
String | The email address to use as the sender, when mailing output. The address can be a full RFC 2822-compliant address (e.g., "Joe Blow <joe@example.org>") or just a simple address (e.g., "joe@example.org"). This parameter is only honored when at least one email address is specified via the MailOutputTo configuration parameter. | No | curn constructs its own "from" address from the user name associated with running process and the current host name. |
SMTPHost SMTPLocalhost MailSubject MailOutputTo |
MailSubject plug-in |
String | The subject line to use when mailing output. This parameter is only honored when at least one email address is specified via the MailOutputTo configuration parameter. | No | curn output |
SMTPHost SMTPLocalhost MailFrom MailOutputTo |
MailIndividualArticles plug-in |
Boolean | If set to true, this parameter instructs curn to send an
email per article; that is, instead of a single email containing
the output from all output handlers, curn will send one
individual email for each article. If curn finds 20
unread articles, it'll send 20 email messages, each with a single
article; if there are 100 unread articles, curn will send
100 separate email messages. If there are multiple output handlers
that actually produce output, then each article email will be a
MIME multipart/alternate email containing separate attachments from
each output handler for that article.
If this parameter is false or absent, curn will send one email containing the generated output for all feeds and items. If there are multiple output handlers that actually produce output, curn will combine all the outputs into a single MIME multipart/alternative email. Each output handler's output will be a separate multipart/alternative attachment. (curn assumes that each output handler is generating an alternate form of the same information.) Output handlers that don't generate output are skipped. If none of the configured output handlers generate any output, then curn doesn't send an email message. This parameter is ignored if no email addresses are specified by the MailOutputTo parameter. WARNINGS:
|
No | Output is not emailed. |
SMTPHost SMTPLocalhost MailFrom MailSubject |
MaxArticlesToShow
plug-in |
Integer | Sets an upper limit on the number of articles displayed for the feed. This maximum is applied after the articles are sorted (see SortBy) and after the ShowArticlesFor and IgnoreArticlesOlderThan policies are applied. This parameter can be overridden on a per-feed basis. This global parameter sets the default value. | No | None (i.e., no maximum) | |
MaxSummarySize plug-in |
Positive integer | If an article has a summary, you can optionally set a maximum size for the summary. If a summary exceeds the maximum size, curn will truncate it and add a trailing ellipsis ("...") to indicate the truncation. A value of 0 effectively disables this option. This parameter can be overridden on a per-feed basis. This global parameter sets the default value. | No | 0 (i.e., no limit on summary size) | ReplaceEmptySummaryWith |
MaxThreads | Positive integer | Defines the number of concurrent download threads. If this value is greater than 1, then curn will spawn that many worker threads to handle the downloading and parsing of the RSS feeds concurrently. If this value is 1, curn will process the feeds sequentially. If this value is greater than 1, but less than the total number of feeds, some of the worker threads will end up processing more than one feed (sequentially). Values less than 1 are illegal. | No | 5 | |
NoCacheUpdate | Boolean | If set to true (and if a cache file is specified), this parameter tells curn to read the cache file and honor its contents, but not to save the modified in-memory cache back to disk. | No | false |
CacheFile --no-update |
ParserClass | String |
The full name of the underlying RSS parser class to be used.
This class must implement the
org.clapper.curn.parser.RSSParser
interface. It can be a first-class parser of its own, or
it can be nothing more than an adapter for a third party
RSS parser class.
curn comes bundled with one parser:
Any class that implements org.clapper.curn.parser.RSSParser may be used as a value for ParserClass. |
No | org.clapper.curn.parser.rome.RSSParserAdapter | |
Quiet | Boolean | Normally, if an RSS feed contains no new items, most curn output handlers display the site's name and URL, followed by something like "No new items." Similarly, if curn can't contact a feed site, or if the site's XML is unparseable, curn displays an error message. This option tells curn to silently ignore sites with no data or bad XML. Setting Quiet to true tells curn to suppress both of the above displays. | No | false |
--quiet --no-quiet |
ReplaceEmptySummaryWith plug-in |
String |
Tells curn what to do when the summary for a feed
article is missing. Legal values:
|
No | nothing | Per-feed SortBy parameter |
ShowArticlesFor | String |
How long to display show articles from feeds. If specified, this
parameter is only used when individual feeds don't specify a ShowArticlesFor
parameter if their own. The value is a time interval, expressed using the same natural
language strings supported by the IgnoreArticlesOlderThan
parameter. For instance:
Valid interval names (in English) are:ShowArticlesFor: 3 days ShowArticlesFor: 1 week ShowArticlesFor: 365 days ShowArticlesFor: 12 hours, 30 minutes
NOTE: The plug-in that implements this capability uses the timestamp in the curn cache when aging an article, not the timestamp in the feed's XML. That's because the intent of this configuration parameter is to permit you to keep showing an article for a certain amount of time after the article was first displayed. The article timestamp in the XML is the time that the article was published, not the time that curn first displayed it. The time in the curn cache represents the time that curn first saw (and presumably displayed) the article. WARNINGS:
|
No | 1 millisecond (i.e., show each article once) | Per-feed ShowArticlesFor parameter |
ShowAuthors plug-in |
Boolean | If set to true, this configuration item instructs curn to display author version for each feed item, if available. This global value can be overridden on a per-feed basis. | No | false | |
ShowDates plug-in |
Boolean | Some RSS feeds or the individual items within each feed contain dates (usually corresponding to the publication dates for the feed or item). If this option is set to true, then curn will display the date for each item that provides a date. This global value can be overridden on a per-feed basis. | No | false | |
ShowRSSVersion | Boolean | Display the RSS version for each feed. | No | false | |
SummaryOnly plug-in |
Boolean |
Some RSS feeds provide a description for each item, in addition
to the (brief) title. Setting SummaryOnly
to true suppresses display of the description. This parameter
can be overridden on a per-feed basis.
This global value sets the default value.
WARNING: This parameter is deprecated. Use the ReplaceEmptySummaryWith parameter, instead. |
No | false | ReplaceEmptySummaryWith |
SMTPHost plug-in |
String | The SMTP host to use when mailing output. This parameter is only honored when at least one email address is specified via the MailOutputTo configuration parameter. | No | localhost | per-feed ReplaceEmptySummaryWith parameter |
SMTPLocalhost plug-in |
String | The name to use to identify the local host when sending email. This parameter is only honored when at least one email address is specified via the MailOutputTo configuration parameter. | No | The canonical name of the local host | per-feed ReplaceEmptySummaryWith parameter |
SortBy plug-in |
String |
Default method to use to sort items within each feed. This
parameter is used when the configuration section for a particular
site lacks its own SortBy
value. Legal values:
|
No | none | Per-feed SortBy parameter |
TotalCacheBackups | Positive integer |
The total number of cache backup copies to keep. If this parameter
is greater than 0, then curn will keep that many numbered backups
of the cache. If the cache exists when curn attempts to update
it, curn will copy the existing cache to
cacheFile.0. If
cacheFile.0 exists, it will be moved to
cacheFile.1 first, and so on down the line,
until the maximum number of cache backup files exists.
The newest cache is always the one without a numeric extension.
the oldest file is the one with the largest numeric extension.
This parameter is useful if you want to roll back to a previous cache.
If this parameter is not specified, or is 0, then no cache backups are made. |
No | 0 | CacheFile |
UserAgent plug-in |
String | Specifies the default HTTP User-Agent header to use. This configuration parameter permits you to have curn masquerade as a known browser, for sites that refuse access to robots and spiders and other unknown web clients. This global value is used when the section for a particular feed does not supply its own UserAgent value. | No | A string that identifies curn as the user agent. | Per-feed UserAgent parameter. |
ZipOutputTo plug-in |
String | Path to a zip file to receive all output generated by output handlers. | No | None |
The curn configuration file also contains a list of RSS feeds to be polled. Each feed must be specified in its own section in the configuration file. The name of the section must start with the string "Feed". If more than one feed is present, then each section name must also have additional characters, to make the section name unique. The following section names are all valid for RSS feed sections.
Each feed section supports the following parameters. (Parameters marked with plug-in are handled by one of curn's stock plug-ins, rather than by the core code.)
Variable | Argument type | Description | Required? | Default Value | ||||||
---|---|---|---|---|---|---|---|---|---|---|
AllowEmbeddedHTML plug-in |
Boolean | Whether or not to allow
embedded HTML in certain RSS feed elements, such as description,
author, etc, for this feed. Some RSS formats permit embedded HTML; setting this
parameter to true tells curn
output handlers that they should
preserve such embedded HTML markup, if possible. If this parameter
is false, any embedded HTML is stripped.
Note that certain output handlers will strip HTML regardless of this setting. An output handler that produces text, for instance, is not required to support embedded HTML. Notes:
|
No | false | ||||||
ArticleFilter plug-in |
Strings | Specifies a set of filters to discard feed item (article)
content, based on regular expressions.
The filtering syntax is (shamelessly) adapted from the rawdog RSS reader's article-filter plug-in. A feed filter is configured by adding an ArticleFilter property to the feed's configuration section. The property's value consists of one or more filter command sequences, separated by ";" characters. (The ";" must be surrounded by white space; see below.) Each filter command sequence is of this form: show|hide [field 'regexp' [field 'regexp' ...]]field can be one of:
If the command is "hide", then the entry will be hidden if the specified field matches the regular expression. If the command is "show", then the entry will be shown if the field matches the regular expression. If there are no fields or regular expressions, then the command is a wildcard match. That is:hide author 'Raymond Luxury-yacht' ; \ show author 'Arthur +.Two-sheds. +Jackson' is equivalent to:hide and:hide any '.*' is equivalent to:show Wildcard matches are useful in situations where you want to hide or show "everything but ...". See the examples, below, for details.show any '.*' All filtering commands are processed, and the end result is what defines whether a given entry is suppressed or not. Regular expressions are matched in a case-blind fashion. The match logic also:
Examples Some examples will help clarify the syntax. For example, the following set of commands hide all articles with the phrase "mash-up" (because mash-ups bore me): The following, more complicated, entry hides everything by author "Joe Blow", unless the title has the word "rant" in it ('cause his rants are hilarious):ArticleFilter: hide any 'mash[- \t]?up' Finally, this example hides everything except articles by Moe Howard:ArticleFilter: hide author '^joe *blow$' ; \ show author '^joe *blow$' title rant ArticleFilter: hide ; show author '^moe *howard$' |
No | Articles are not filtered | ||||||
CommonXMLFixups plug-in |
Boolean |
Enables or disables the Common XML Fixups plug-in, which attempts
to fix common syntax problems in downloaded XML feeds. Among the
corrections this plug-in makes:
|
No | The value of the global CommonXMLFixups parameter in the [curn] section, or false, if that value is not set. | ||||||
DaysToCache | Positive integer | Maximum number of days to cache an already-read item for this feed. This value locally overrides the global DaysToCache default in the [curn] section. Items older than this many days are tossed from the cache when it's read, which means curn forgets that it saw them before. A value of 0 renders the cache is essentially useless for this feed (i.e., 0 ensures that curn always forgets items that are cached for this feed). The special value "NoLimit" causes curn to leave items in the cache forever. | No | The value of the global DaysToCache parameter in the [curn] section or 365 if that value is not set. | ||||||
Disabled plug-in |
Boolean | If true, then the feed is skipped. If false, the feed is processed. This variable provides a simple way to disable a feed without having to comment its entire section out. | No | false | ||||||
EditFeedURL EditItemURL plug-in |
String |
Apply the specified regular expression edit to the site's
feed URL (EditFeedURL) or to each of the
site's RSS item URLs (EditItemURL).
The value for this option consists of
a Perl 5-style substitution applied to the URL. For example:
Remove all the parameters from the URL: 's/?.*$//' (The PruneURLs parameter provides a simpler mechanism for this common operation.) Remove a "redirect" CGI from a site whose URLs look like: http://www.example.com/redir.cgi?http://... s+http://www.example.com/cgi-bin/redir.cgi?++ The substitution syntax supports perl's $1, $2, etc., grouping syntax. However, because the "$" character also introduces a configuration file variable reference, you must escape the "$" to use it in a regular expression. For instance, use either: s/^([a-z]+)foo(.*)\$/\$1bar\$2/ or 's/^([a-z]+)foo(.*)$/$1bar$2/' If there are backslashes in the string, you must escape them, as well, preferably by single-quoting the value. See Suppressing Metacharacter Expansion and Variable Substitution for more details. To get the equivalent of Perl 5 expression. s/^\*.*$// you must specify 's/^\*.*$//' This substitution syntax supports the following Perl-like modifiers, which are appended to the end of the substitution command:
The modifiers can be concatenated. Thus, 's/abc/xyz/ig' will match and replace all occurrences of the string "abc", whether upper-, lower- or mixed-case. Hint: When logging is enabled, curn will log the parsed expression at the "debug" log level. |
No | None | ||||||
ForceEncoding | String | Force curn to ignore the character set
encoding advertised by the remote server (if any), and use the
character set specified by this configuration item, instead.
This is useful in the following cases:
This value should be a character set encoding that is recognized by the Java runtime environment. ForceCharacterEncoding is a synonym for this parameter, retained for backward compatibility. |
No |
|
||||||
GzipDownload plug-in |
Boolean | If set to true, this parameter directs curn to use the "Accept-Encoding: gzip" HTTP header when retrieving this RSS feed from an HTTP server. Since RSS feeds are XML, they typically compress well; retrieving gzipped data, rather than the uncompressed HTML, can save a significant amount of time and network bandwidth. (Note, however, that HTTP servers are not obligated to honor a request to gzip the feed.) This parameter overrides the global GzipDownload. | No | true | ||||||
IgnoreArticlesOlderThan plug-in |
String | Provides a way to ignore articles that are older than a certain interval. Intervals are expressed in a natural language syntax. Please see the documentation for the global IgnoreArticlesOlderThan parameter for a more complete description of this parameter. | No | The default, as defined by the global IgnoreArticlesOlderThan parameter. If no global IgnoreArticlesOlderThan value is set, then articles aren't ignored based on their age. | IgnoreDuplicateTitles plug-in |
Boolean |
If true, curn will ignore any item
whose title matches the title of another item in the feed. It
only compares titles within the feed itself; it does not
compare against titles of cached items.) Titles are compared
without regard to upper or lower case.
This feature (hack, really) is useful for sites whose feeds often contain duplicate items (with the same titles) that have different IDs and different URLs, and thus appear to be unique. (Yahoo! News feeds sometimes exhibit this trait.) |
No | false | |
MaxArticlesToShow
plug-in |
Integer | Sets an upper limit on the number of articles displayed for the feed. This maximum is applied after the articles are sorted (see SortBy) and after the ShowArticlesFor and IgnoreArticlesOlderThan policies are applied. | No | The default, as defined by the global MaxArticlesToShow parameter. If no global MaxArticlesToShow value is set, then there is no maximum. | ||||||
MaxSummarySize plug-in |
Positive integer | If an article has a summary, you can optionally set a maximum size for the summary. If a summary exceeds the maximum size, curn will truncate it and add a trailing ellipsis ("...") to indicate the truncation. A value of 0 effectively disables this option. This parameter overrides the global MaxSummarySize parameter. | No | 0 (i.e., no limit on summary size) | ||||||
PreparseEditsuffix plug-in |
String |
A parameter in a Feed section that starts with
PreparseEdit (e.g.,
PreparseEdit1,
PreparseEditFoo, etc.)
defines a substitution to be applied to the downloaded XML
file before it is parsed. As with the
EditItemURL and
EditFeedURL
options, the value for this option this option consists of a
Perl 5-style substitution.
This capability is rarely needed, but it's sometimes useful for sites that serve unparseable, but easily fixed, XML. (Though the CommonXMLFixups capability covers a lot of these errors with less configuration.) For instance, one news site I read has an RSS channel whose title always contains an unescaped "&". The XML parser will not parse that feed; however, a simple preparse edit command of: 's/ & / \& /g' fixes the problem. (Again, this is one of the common XML syntax errors that CommonXMLFixups will correct.) Another use for PreparseEdit is fixing incorrectly formatted links in the RSS feed. Consider the following <link> element, for fictitious site news.example.com: <link>http://news.example.com&article=12573</link> This is a perfectly parseable URL, but it happens to be wrong. It's missing a "/" between ".com" and "&". It really ought to be: <link>http://news.example.com/&article=12573</link> A quick PreparseEdit rule can fix it, though: PreparseEdit: 's|(news.example.com)([^/]+)|$1/$2| Note the use of a different delimiter in the edit command ("|", instead of "/"). Any non-alphabetic character will work. Multiple instances of this parameter are permitted, as long as each instance's name begins with the string "PreparseEdit" and contains a unique suffix. The substitution syntax supports perl-style $1, $2, etc., grouping syntax. However, because the "$" character also introduces a configuration file variable reference, you must escape the "$" to use it in a regular expression. For instance, use either: s/^([a-z]+)foo(.*)\$/\$1bar\$2/ or 's/^([a-z]+)foo(.*)$/$1bar$2/' If there are backslashes in the string, you must escape them, as well, preferably by single-quoting the value. See Suppressing Metacharacter Expansion and Variable Substitution for more details. To get the equivalent of Perl 5 expression. s/^\*.*$// you must specify 's/^\*.*$//' This substitution syntax supports the following perl-like modifiers, which are appended to the end of the substitution command:
The modifiers can be concatenated. Thus, 's/abc/xyz/ig' will match and replace all occurrences of the string "abc", whether upper-, lower- or mixed-case. Hint: When logging is enabled, curn will log the parsed expression at the "debug" log level. |
No | None | ||||||
PruneOriginalRSSTo plug-in |
[options] Path | If set, this parameter specifies that the original, unparsed feed should be pruned to contain only new items, then written back out to the specified file. This approach differs from that of SaveAsRSS in that it operates on the raw, unparsed feed data; SaveAsRSS, by contrast, regenerates its RSS output from the parsed RSS feed data. As a result, SaveAsRSS will sometimes lose non-standard RSS XML markup. PruneOriginalRSSTo is less likely to do that, since it operates at an XML level, not an RSS level. This configuration item takes a command line-style value: orPruneOriginalRSSTo: [--backups total_backups] [--encoding encoding] path The parameters have the following meanings:PruneOriginalRSSTo: [-b total_backups] [-e encoding] path
| No | None | ||||||
PruneOriginalRSSOnly plug-in |
Boolean | If set, and if PruneOriginalRSSTo is also set, then the feed will be downloaded and parsed, and the pruned RSS output will be generated, but the feed will not be passed to any output handlers (or, for that matter, any other plug-ins). | No | false | ||||||
PruneURLs plug-in |
Boolean | Specifies that all URLs should be pruned of their HTTP parameters. This action can also be accomplished with EditItemURL and EditFeedURL directives; PruneURLs is convenient shorthand for a common operation. | No | None | ||||||
ReplaceEmptySummaryWith plug-in |
String |
Tells curn what to do when the summary for a feed
article is missing. Legal values:
|
No | nothing | ||||||
SaveAs plug-in |
[options] Path | If set, this parameter specifies the path to
a file where curn should save the raw XML contents of
the feed, whenever it downloads the feed. This can be useful
if you have a master version of curn that downloads
a bunch of feeds, with multiple slave versions of curn
that then run against the downloaded files. (See
Being Bandwidth Friendly for a more
detailed discussion of this tactic.)
This configuration item takes a command line-style value: orSaveAs: [--backups total_backups] [--encoding encoding] path The parameters have the following meanings:SaveAs: [-b total_backups] [-e encoding] path
|
No | None | ||||||
SaveAsEncoding plug-in |
String | If set, and if
SaveAs parameter is also
set, then this parameter specifies the character
encoding to use when saving the feed to the file.
If SaveAs is not set for the feed,
then any SaveAsEncoding parameter is
ignored. WARNING: This parameter is deprecated. Use the --encoding option to the SaveAs parameter, instead. | No | "utf-8". Note that this default value is the same as the default value of the ForceEncoding, for file URLs. This makes it easy to have one instance of curn save RSS feeds for other instances to parse. | ||||||
SaveOnly plug-in |
Boolean | If set, and if SaveAs is also set, then the feed will be downloaded and saved, but not parsed and not included in the generated output. This parameter can be useful when Being Bandwidth Friendly. | No | false | ||||||
SaveAsRSS plug-in |
[options] Path | If set, this parameter specifies that the feed should be rewritten in the specified
RSS format and saved to the specified file. This configuration item takes a command line-style value:
orSaveAsRSS: [--backups total_backups] [--type rsstype] [--encoding encoding] path The parameters have the following meanings:SaveAsRSS: [-b total_backups] [-t rsstype] [-e encoding] path
| No | None | ||||||
SaveRSSOnly plug-in |
Boolean | If set, and if SaveAsRSS is also set, then the feed will be downloaded and parsed, and the RSS output will be generated, but the feed will not be passed to any output handlers (or, for that matter, any other plug-ins). | No | false | ||||||
SavedBackups | Positive integer | Number of saved backups to keep. If this value is non-zero, the handler will back the SaveAs file up before overwriting it. Up to SavedBackups total backed-up files will be kept. A value of 0 disables the feature. | No | 0 | ||||||
ShowArticles
plug-in |
String |
How long to display show articles from the feed. The value is a time
interval, expressed using the same natural language strings supported
by the IgnoreArticlesOlderThan
parameter. Please see the documentation for the global
ShowArticlesFor parameter for a more
complete description of this parameter.
This value overrides the global ShowArticlesFor parameter. |
No | The value of the global ShowArticlesFor parameter. | ||||||
ShowAuthors
plug-in |
Boolean | If set to true, this configuration item instructs curn to display author version for this feed, if available. This value overrides the global ShowAuthors parameter. | No | The value of the global ShowAuthors parameter. | ||||||
ShowDates
plug-in |
Boolean | If set to true, this configuration item instructs curn to display any dates associated with this feed, if available. This value overrides the global ShowDates parameter. | No | The value of the global ShowDates parameter. | ||||||
SortBy plug-in |
String |
How to sort items in this feed. This value locally overrides the
global SortBy parameter
in the [curn] section.
Legal values:
|
No | The value of the global SortBy parameter in the [curn] section. | ||||||
SummaryOnly plug-in |
Boolean |
Some RSS feeds provide a description for each item, in addition
to the (brief) title. Setting SummaryOnly
to true suppresses display of the description. This parameter
overrides the global
SummaryOnly parameter.
WARNING: This parameter is deprecated. Use the ReplaceEmptySummaryWith parameter, instead. |
No | The value of the global SummaryOnly parameter. | ||||||
TitleOverride plug-in |
String | Specifies a string to be used as the site's title, instead of the title supplied in the RSS XML. Useful when the real site-supplied title is not suitable. | No | None | ||||||
URL | String | The fully-qualified URL for the feed. For local files, use a "file:" URL. | Yes | None | ||||||
UserAgent plug-in |
String | Specifies the HTTP User-Agent header to use when retrieving this feed. This local value overrides the global UserAgent parameter in the [curn] section. This configuration parameter permits you to have curn masquerade as a known browser, and it's useful for sites that refuse access to robots and spiders and other unknown web clients. | No | The value of the global UserAgent parameter in the [curn] section. |
As curn processes each RSS feed, it parses the XML and loads the new items into internal data structures. When it has finished processing the XML, it hands the parsed data structures to one or more output handlers. Output handlers are so called because they generally produce output that's to be displayed or emailed to the user—generally, but not always. An output handler may choose to save its output to a file, but not send the output back to curn; each of the built-in output handlers does exactly that if its SaveAs configuration parameter is set and its SaveOnly configuration parameters is true. Alternatively, the output handler may choose to convert the internal data structures to output that it publishes somewhere (e.g., via a network connection to an HTTP server).
Each output handler is specified in its own section in the configuration file. The name of the section must start with the string "OutputHandler". If more than one output handler is present, then each section name must also have additional characters, to make the section name unique. The following section names are all valid for output handler sections.
If no OutputHandler sections are present in the configuration file, curn skips the RSS XML parsing phase. (There's not reason to parse the XML if there are no output handlers to process the parsed feed data.) If there are no output handlers, curn may or may not download individual feeds. If a given feed has no SaveAs setting, and there are no output handlers, then curn skips the feed entirely. After all, there's no sense wasting time downloading the feed, if the feed isn't being parsed or saved. However, if the feed does have a SaveAs setting, curn will download and save the XML (assuming it has changed) even if XML parsing is disabled.
All output handler sections take two variables. In addition, individual output handlers can require configuration items of their own. The two variables common to all output handlers are described below.
Variable | Argument type | Description | Required? | Default Value |
---|---|---|---|---|
Class | String | Identifies Java class that implements the output handler. (The class must implement the org.clapper.curn.OutputHandler interface. See Writing Your Own Output Handler for details.) | Yes | |
Disabled | Boolean | If true, the output handler is skipped. If false, the output handler is processed. This variable provides a simple way to disable an output handler without having to comment its entire section out. | No | false |
There are some output handler examples following the next section.
curn comes bundled with the following built-in output handlers.
The FreeMarkerOutputHandler, introduced in curn version 2.6, is both simple and flexible. It uses the FreeMarker template engine to convert a template to an output file. FreeMarker templates can be used to generate nearly any kind of textual output file, from HTML and XML to simple text. In fact, the HTMLOutputHandler, TextOutputHandler, and SimpleSummaryOutputHandler have been reimplemented to use the FreeMarkerOutputHandler in conjunction with built-in templates that produce the appropriate kind of output.
Additional Configuration Items | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Variable | Argument type | Explanation | Required? | Default value | ||||||||
AllowEmbeddedHTML | Boolean | Whether or not the specified template supports embedded HTML. If embedded HTML is found within an RSS item, it will be included in the generated output only if (a) this parameter is true, and (b) the AllowEmbeddedHTML parameter for the feed is also true. Otherwise, embedded HTML will be stripped from the item. | No | false | ||||||||
Encoding | String | Specify the character encoding to use when writing the output file. | No | "utf-8" | ||||||||
SaveAs | File name or path name | Save a copy of the generated HTML to the specified file. The argument is the path to the file. WARNING: The syntax of this parameter is different from the syntax of the SaveAs parameter for a feed. | No | None (i.e., no copy is saved) | ||||||||
SaveOnly | Boolean | If true and if SaveAs is defined, then save a copy of the generated HTML, but don't make it available to the user. (i.e., Don't display it on standard output, and don't email it.) | No | false | ||||||||
ShowCurnInfo | Boolean | Whether or not to display the curn version, curn configuration file path, and other curn-related information at the bottom of the generated HTML. | No | true | ||||||||
TemplateFile | Two strings | Specifies the location of the FreeMarker template file.
The location is specified with three parameters:
The form of the identifier string depends on the type value.
|
Yes | |||||||||
Title | String | If set, this string overrides the title and the topmost heading in the generated HTML. | No | RSS Feeds | ||||||||
TOCItemThreshold | Positive integer | The total number of items (not feeds, but individual items) that must be displayed before curn will generate a table of contents header in the HTML. A value of 0 causes curn to generate a table of contents regardless of how many items are displayed. | No | A very large number, which effectively disables the table of contents entirely. |
You can also write your own FreeMarker template, to change the output format. See the subsection entitled Writing Your Own FreeMarker template, in the Extending curn section, below.
Additional Configuration Items | ||||
---|---|---|---|---|
Variable | Argument type | Explanation | Required? | Default value |
HTMLEncoding | String | Specify the character encoding to use when writing the HTML. The encoding will be stored in an HTML <META> tag, and it will be used by the Java runtime when opening the output file (to ensure proper translation of characters from the in-memory Unicode character set). This parameter is mapped to the FreeMarkerOutputHandler's encoding parameter. | No | "utf-8" |
SaveAs | File name or path name | Save a copy of the generated HTML to the specified file. The argument | No | None (i.e., no copy is saved) |
SaveOnly | Boolean | If true and if SaveAs is defined, then save a copy of the generated HTML, but don't make it available to the user. (i.e., Don't display it on standard output, and don't email it.) | No | false |
ShowCurnInfo | Boolean | Whether or not to display the curn version, curn configuration file path, and other curn-related information at the bottom of the generated HTML. | No | true |
Title | String | If set, this string overrides the title and the topmost heading in the generated HTML. | No | RSS Feeds |
TOCItemThreshold | Positive integer | The total number of items (not feeds, but individual items) that must be displayed before curn will generate a table of contents header in the HTML. A value of 0 causes curn to generate a table of contents regardless of how many items are displayed. | No | A very large number, which effectively disables the table of contents entirely. |
Additional Configuration Items | ||||
---|---|---|---|---|
Variable | Argument type | Explanation | Required? | Default value |
SaveAs | File name or path name | Save a copy of the generated text to the specified file. The argument | No | None (i.e., no copy is saved) |
SaveOnly | Boolean | If true and if SaveAs is defined, then save a copy of the generated text, but don't make it available to the user. | No | false |
ShowCurnInfo | Boolean | Whether or not to display the curn version, curn configuration file path, and other curn-related information at the bottom of the generated output. | No | true |
Additional Configuration Items | ||||
---|---|---|---|---|
Variable | Argument type | Explanation | Required? | Default value |
SaveAs | File name or path name | Save a copy of the generated text to the specified file. The argument is a relative or absolute path to the file where the feed's XML should be saved. | No | None (i.e., no copy is saved) |
SaveOnly | Boolean | If true and if SaveAs is defined, then save a copy of the generated text, but don't make it available to the user. | No | false |
Message | String | Static text that is to be included in the output. The text appears right after the heading line and before the actual summary of the RSS feeds. The sample output was created using a Message value that points to a URL, where (presumably) the output from the HTML handler has been saved. See Example 3, below. | No | Nothing |
ShowCurnInfo | Boolean | Whether or not to display the curn version, curn configuration file path, and other curn-related information at the bottom of the generated output. | No | true |
Additional Configuration Items | |||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Variable | Argument type | Explanation | Required? | Default value | |||||||||||||||||||||||||||||||||||||||
Script | File name or path name | Path to the script to be invoked. The script will be called once, as if from the command line, except that additional global objects will be available via BSF. | Yes | None | |||||||||||||||||||||||||||||||||||||||
Language | String |
The scripting language, as recognized by BSF. This
handler supports all the scripting language engines that
are built into the BSF distribution. Of course, the jar
files for the scripting languages themselves must be available at
runtime, for those languages to be
available. (See the section entitled
Installing Support Software for details.)
The following values represent some of the languages available for this parameter. The BSF values comes from the Languages.properties file distributed with BSF version 2.3.0. The JSR 223 languages are the set of languages supported by the JSR 223 engines at https://scripting.dev.java.net/, as of the date this document was last updated. Consult that web site for details on available JSR 223 languages. NOTE: In all cases, except Rhino, the actual script language itself does not come with the scripting infrastructure; you have to download the language separately. The scripting infrastructure merely contains bindings to (a.k.a., engines for) the various supported scripting languages.
|
Yes | false | |||||||||||||||||||||||||||||||||||||||
SaveAs | File name or path name | Save a copy of the generated text to the specified file. The argument is a relative or absolute path to the file where the feed's XML should be saved. | No | None (i.e., no copy is saved) | |||||||||||||||||||||||||||||||||||||||
SaveOnly | Boolean | If true and if SaveAs is defined, then save a copy of the generated text, but don't make it available to the user. | No | false | |||||||||||||||||||||||||||||||||||||||
ScriptingAPI: | String | Specifies which scripting infrastructure to use. Legal values are:
|
No | Default behavior: curn first tries to use the JSR 223 (javax.script) infrastructure; if that doesn't work, it tries to use BSF. If neither framework is available, and a ScriptOutputHandler is specified, curn aborts. | |||||||||||||||||||||||||||||||||||||||
ShowCurnInfo | Boolean | Whether or not to display the curn
version, curn configuration file path, and other
curn-related information at the bottom of the generated
output. Note: There's no guarantee that a given script will honor this setting. |
No | true |
Example 1: The output handler sections from a curn configuration file that produces HTML output. If curn is called with email addresses, the HTML output will be mailed to the specified email addresses. The HTML output is not saved anywhere.
[OutputHandler] Class: org.clapper.curn.output.html.HTMLOutputHandler Disabled: false |
Example 2: The output handler sections from a curn configuration file that produces HTML output and plain text output. If curn is called with email addresses, the text output and the HTML output will be mailed to the specified email addresses as "multipart/alternative" attachments. The output is not saved anywhere.
[OutputHandlerText] Class: org.clapper.curn.output.TextOutputHandler [OutputHandlerHTML] Class: org.clapper.curn.output.html.HTMLOutputHandler |
Example 3: The output handler sections from a curn configuration file that produces HTML output to a file (but not to the user), and displays (or emails) the user a text summary that contains a link to the HTML file.
[OutputHandlerSummary] Class: org.clapper.curn.output.SimpleSummaryOutputHandler # Message assumes that generated HTML is available via the web server running # on internal machine "foo", at the specified location. Message: See http://foo/rss/news.html [OutputHandlerHTML] Class: org.clapper.curn.output.html.HTMLOutputHandler # Below path is assumed to correspond to URL http://foo/rss/news.html SaveAs: /usr/local/www/htdocs/rss/news.html SaveOnly: true |
Example 4: The feed and output handler sections from a curn configuration file that retrieves and downloads XML feeds, caching them in a known location, without displaying them. (See Being Bandwidth Friendly for reasons why you might want to do this.)
[vars] feedDir: ${system:user.home}/.curn/feeds [curn] ... [FeedWired] URL: http://www.wired.com/news_drop/netcenter/netcenter.rdf SaveAs: ${vars:feedDir}/wired.rdf [Feed_yahoo_top] URL: http://rss.news.yahoo.com/rss/topstories SaveAs: ${vars:feedDir}/yahoo_top_stories.xml [Feed_cnn_top] URL: http://csociety.purdue.org/~jacoby/XML/CNN_TOP_STORIES.xml SaveAs: ${vars:feedDir}/CNN_TOP_STORIES.xml |
You can run curn manually, at the command line, whenever you feel like checking your news feeds. However, this is less than useful. The best way to run curn to have your computer's background scheduler process (e.g., cron(8) on UNIX-like systems, or the Windows Scheduler on Windows) run curn for you automatically, every so often.
I typically run curn three times a day, via cron, on weekdays and once on weekends, and I have it mail me the output. I use my personal crontab, rather than the system-wide /etc/crontab file. Here's a sample crontab entry that does this:
0 8,12,16 * * 1-5 /usr/local/curn/bin/curn $HOME/.curn/my.cfg 0 16 * * 0,6 /usr/local/curn/bin/curn $HOME/.curn/my.cfg |
Currently, this task is left as an exercise to the reader. (I don't use Windows often enough to play with the scheduler, so I haven't gotten around to running curn that way. When I do find the time and inclination to experiment with running curn from the Windows Scheduler, I'll update this section. Unless, of course, someone else wants to supply the relevant details...)
When pulling down RSS documents from remote HTTP servers, curn does its best to minimize the amount of bandwidth it consumes. By default, it uses the following strategies to do so (though most can be overridden by configuration parameters and command-line parameters).
But there are other things you can do to be polite to remote HTTP servers.
I run curn three times a day. In practice, that's more than sufficient to keep up with the daily news feeds I want to read. Your needs may vary, but if you're using curn to poll remote RSS feeds every five minutes, you probably fall into the "impolite RSS feed user" category.
Suppose you have a number of users, all of whom run curn several times a day. Further suppose that there's significant commonality in the RSS feeds that they want to read. Rather than have each user poll the common remote HTTP servers individually, you could run a single instance of curn that downloads and saves those feeds several times a day. You could then instruct the individual users to point their curn configuration files at the local copies of the RSS feeds, instead of the remote ones.
For instance, suppose the following URLs (for a fictitious RSS aggregator service) are commonly downloaded by people on your local network:
http://www.example.org/rss/nytimes_front_page.xml http://www.example.org/rss/bbc_world_news.rdf http://www.example.org/rss/big_jimmys_blog.xml
You could run curn periodically with the following configuration file, to download each of those feeds without producing any output.
[var] # "feedDir" dumps to a directory that's accessible internally via URL # http://hub.ourdivision.example.com/rssfeeds/ feedDir: /usr/local/apache/htdocs/rssfeeds # curnDir: where this file and the cache live curnDir: /usr/local/etc/curn [curn] CacheFile: ${var:curnDir}/common.cache MaxThreads: 15 ParserClass: org.clapper.curn.parser.rome.RSSParserAdapter GzipDownload: true ##################################### # No output handlers are configured # ##################################### [Feed_nytimes_front] # New York Times front page. Accessible internally as: # http://hub.ourdivision.example.com/rssfeeds/nytimes_front_page.xml URL: http://www.example.org/rss/nytimes_front_page.xml SaveAs: ${var:feedDir}/nytimes_front_page.xml [Feed_bbc_news_world] # BBC World News page. Accessible internally as: # http://hub.ourdivision.example.com/rssfeeds/bbc_world_news.rdf URL: http://www.example.org/rss/bbc_world_news.rdf SaveAs: ${var:feedDir}/bbc_world_news.rdf [Feed_big_jimmy] # Big Jimmy's Blog. (Why is this blowhard so popular?) # Accessible internally as: # http://hub.ourdivision.example.com/rssfeeds/big_jimmys_blog.xml URL: http://www.example.org/rss/big_jimmys_blog.xml SaveAs: ${var:feedDir}/big_jimmys_blog.xml |
Note that you could use curn in this manner even if your users are not using curn to read their RSS feeds. You could still run your periodic instance of curn to download the common feeds to a directory that's part of an internal web site, and instruct your users to point whatever RSS readers they're using to those internal web pages, instead of the external ones.
This section is intended for Java programmers who want to extend curn's capabilities by writing additional output handlers, integrating a different RSS parser, or even writing a new command-line or GUI front-end to curn's main logic.
Roughly speaking, curn's processing is divided into the following phases:
Since curn permits you to specify the parsing and output handler classes in its configuration file, you can easily extend curn's capabilities by writing your output handler or integrating a different RSS parser. In addition, as of version 3.0, curn supports general-purpose plug-ins that can intercept various phases of curn processing.
Suppose you've found (or written) an RSS parser that you prefer to use instead of ROME. (Perhaps it's faster than one of those, or perhaps it supports some new RSS syntax that ROME does not support. Or, perhaps you're just playing around.) Integrating that parser with curn requires writing some adapter classes that implement some interfaces and extend some classes provided with the curn software. Those classes are:
Class | Abstract Class or Interface? | Description |
---|---|---|
org.clapper.curn.parser.RSSParser | Interface | Defines a simplified view of an RSS parser. Classes implementing this interface must provide a default public constructor and a parseRSSFeed() method |
org.clapper.curn.parser.RSSChannel | Abstract class | Defines a simplified view of an RSS channel (a.k.a., a parsed feed). RSSParser.parseRSSFeed() must return an object that extends the RSSChannel class. |
org.clapper.curn.parser.RSSItem | Abstract class | Defines a simplified view of an RSS item (a.k.a., one of the items within a parsed feed). The RSSChannel class's getItems() method returns a collection of objects that extend the RSSItem class. |
Typically, integrating a new parser means writing a set of adapter classes that implement the above interfaces or extend the above classes, and map the necessary calls onto methods in the real underlying parser. For a sample integration, see the classes in the curn source code package org.clapper.curn.parser.rome. Those classes implement a simple adapter for the third-party ROME RSS parser.
Note: If you've written an adapter for an unsupported RSS parser engine, you'll have to make your adapter classes and the RSS parser classes available to curn. It's not sufficient to add the appropriate jar files to your CLASSPATH environment variable. Please refer to the section entitled Installing Supporting Software for details.
As of version 3.0, curn supports plug-ins. curn plug-ins can intercept various phases of curn processing and can enhance or modify curn's behavior. This section discusses curn's plug-in support.
If invoked properly (e.g., via the curn shell script or the curn.bat DOS script created by the curn installer), curn will search for and load plug-ins before it begins doing its real work. curn looks for plug-ins in the following directories:
curn searches all jar files, zip files, and subdirectories in each of those directories, looking for any non-abstract, public class that implements one or more of the curn Java plug-in interfaces. curn then attempts to load and instantiate each plug-in. Once a plug-in has been instantiated, its capabilities are available. Most plug-ins are dormant; that is, they don't do anything unless activated by a plug-in-specific configuration entry.
After curn loads all the plug-ins it can find, it sorts them by "sort key" (a special field that each plug-in is required to provide), case-blind comparison. All plug-ins within a given execution phase are, therefore, invoked in alphabetical order by sort key. This bit of processing trivia is useful if you need to ensure that one plug-in fires before another plug-in.
curn ships with a set of stock plug-ins. Some of those plug-ins implement capabilities that were formerly in the curn core code; others provide new functionality. These plug-ins are automatically available (provided you don't delete the curn-plugins.jar file that's shipped with curn). The following table summarizes the stock plug-ins. All stock plug-ins are in the org.clapper.curn.plugins package.
Plug-in Name | Class name | Explanation | Plug-in Configuration Parameters |
---|---|---|---|
Allow Embedded HTML | AllowEmbeddedHTMLPlugIn | Enables or disables embedded HTML in a feed's output. | AllowEmbeddedHTML |
Article Filter | ArticleFilterPlugIn | Filter items (articles) from a feed, based on regular expressions. | ArticleFilter |
Common XML Fixups | CommonXMLFixupsPlugIn | Fix some common syntax errors in downloaded XML. See the configuration parameters for details on these fixups. | CommonXMLFixups |
Disable Feed | DisableFeedPlugIn | Disable a feed, without having to comment it out or remove it from the configuration. | Disabled (feed) |
Disable Output Handler | DisableOutputHandlerPlugIn | Disable an output handler, without having to comment it out or remove it from the configuration. | Disabled (output handler) |
Edit Parsed Feed URL | ParsedFeedURLEditPlugIn | Edit the parsed feed data, changing the feed URL and/or the individual item (article) URLs. |
EditFeedURL EditItemURL PruneURLs |
Email Output | EmailOutputPlugIn | Email any output created by the output handlers to one or more recipients. |
SMTPHost MailOutputTo MailFrom MailSubject MailIndividualArticles |
Empty Article Summary | Empty Article Summary PlugIn | How to handle an empty summary in an article. | ReplaceEmptySummaryWith |
Feed Max Summary Size | FeedMaxSummarySizePlugIn | Truncate a feed's summary to a maximum number of characters. | MaxSummarySize |
Feed Summary Only | FeedSummaryOnlyPlugIn | Optionally strips the full content for a feed, leaving only the summary. | SummaryOnly |
Gzip Download | GzipDownloadPlugIn | Optionally requests that web sites gzip (compress) the XML content before sending it to curn, to save network bandwidth. | GzipDownload |
Ignore Old Articles | IgnoreOldArticlesPlugIn | Suppress articles in a feed that are older than a specified iterval. | IgnoreArticlesOlderThan |
Ignore Duplicate Articles | IgnoreDuplicateArticlesPlugIn | Suppress duplicate articles in a feed, based on a comparison of the article titles. | IgnoreDuplicateTitles |
Max Articles | MaxArticlesPlugIn | Limits the number of articles displayed for a feed. |
MaxArticlesToShow |
Override Feed Title | TitleOverridePlugIn | Overrides the title of a feed. | TitleOverride |
Prune Original RSS | PruneOriginalRSSPlugIn | Remove any already-seen items from the original RSS feed, and write the feed back out to a file. |
PruneOriginalRSSTo PruneOriginalRSSOnly |
Raw Feed Edit | RawFeedEditPlugIn | Apply regular expression edits to a feed's XML before it's parsed. | PreparseEdit |
Retain Articles | RetainArticlesPlugIn | Retains already-seen articles for a specified time. |
ShowArticlesFor |
Save As | RawFeedSaveAsPlugIn | Save a feed's XML to a file. |
SaveAs SaveAsEncoding SaveOnly |
Save As RSS | SaveAsRSSPlugIn | Convert any new data in the feed to a specified RSS format, and save the result to a file. |
SaveAsRSS SaveRSSOnly |
Show Authors | ShowAuthorsPlugIn | Enable or disable the display of the author(s) of a feed and a feed's articles. | ShowAuthors |
Show Dates | ShowDatesPlugIn | Enable or disable the display of the dates for a feed and a feed's articles. | ShowDates |
Sort Articles | SortArticlesPlugIn | Control how a feed article's are sorted. | SortBy |
User Agent | UserAgentPlugIn | Specify the HTTP user-agent to send when downloading a feed. | UserAgent |
Zip Output | ZipOutputPlugIn | Zip any output created by the output handlers into a configured zip file. |
ZipOutputTo |
Installing a custom plug-in (i.e., not one of curn's stock plug-ins) is simple: Copy the plug-in's jar or zip file to one of the directories listed in the Overview of Plug-In Support section.
curn automatically invokes plug-ins at various phases of its execution. A plug-in that is registered for a particular phase will be called during that phase of processing. A given plug-in class can be associated with multiple phases of execution. In fact, most are, if only to permit them to intercept their plug-in-specific configuration parameters.
Each plug-in phase is represented by its own Java interface, and each interface has exactly one method. Each plug-in interface, in turn, extends the curn parent PlugIn interface, which defines some additional methods that all plug-ins must provide.
A plug-in that intercepts multiple curn processing phases must implement the interfaces for each of the phases. Here are the plug-in phases, with their associated interfaces and methods, in execution order.
Plug-in interface | Plug-in method | Description |
---|---|---|
StartupPlugIn | runStartupPlugIn() | Called immediately after curn has started, but before it has loaded its configuration file or its cache. Intercepting this phase is useful if a plug-in needs to perform initialization. |
MainConfigItemPlugIn | runMainConfigItemPlugIn() | Called immediately after curn has read and processed a
configuration item in the main [curn] configuration
section. All configuration items are passed, one by one, to each
loaded plug-in. If a plug-in class is not interested in a
particular configuration item, its
runMainConfigItemPlugIn() method should simply return
without doing anything. Note that some configuration items may
simply be variable assignment; there's no real way to distinguish a
variable assignment from a true configuration item.
A plug-in that wants to provide a configuration item in the main [curn] configuration section must implement this interface. |
FeedConfigItemPlugIn | runFeedConfigItemPlugIn() | Called immediately after curn has read and processed a
configuration item in a "Feed" configuration section. All
configuration items are passed, one by one, to each loaded plug-in.
If a plug-in class is not interested in a particular configuration
item, its runFeedConfigItemPlugIn() method should simply
return without doing anything. Note that some configuration items
may simply be variable assignment; there's no real way to
distinguish a variable assignment from a true configuration
item.
A plug-in that wants to provide a per-feed configuration item must implement this interface. |
OutputHandlerConfigItemPlugIn | runOutputHandlerConfigItemPlugIn() | Called immediately after curn has read and processed a
configuration item in an "OutputHandler" configuration section. All
configuration items are passed, one by one, to each loaded plug-in.
If a plug-in class is not interested in a particular configuration
item, its runOutputHandlerConfigItemPlugIn() method should
simply return without doing anything. Note that some configuration
items may simply be variable assignment; there's no real way to
distinguish a variable assignment from a true configuration
item.
A plug-in that wants to provide a per-output handler configuration item must implement this interface. |
UnknownSectionConfigItemPlugIn | runUnknownSectionConfigItemPlugIn() | Called immediately after curn has read and processed a
configuration item in an unknown configuration section. All
configuration items are passed, one by one, to each loaded plug-in.
If a plug-in class is not interested in a particular configuration
item, its runUnknownSectionConfigItemPlugIn() method should
simply return without doing anything. Note that some configuration
items may simply be variable assignment; there's no real way to
distinguish a variable assignment from a true configuration
item.
A plug-in that requires its own configuration file section must implement this interface. |
PostConfigPlugIn | runPostConfigPlugIn() | Called after the entire configuration has been read and parsed, but
before any feeds are processed. Intercepting this event is useful
for plug-ins that want to adjust the configuration. For instance:
|
CacheLoadedPlugIn | runCacheLoadedPlugIn() | Called after the curn cache has been read (and after any expired entries have been purged), but before any feeds are processed. |
ForceFeedDownloadPlugIn | forceFeedDownload() | Called after the cache is loaded, but before a feed is downloaded. This method returns true if the feed should be downloaded regardless of whether it has changed, and false if curn should only download the feed if the feed has changed since the last download. The Retain Articles plug-in uses this capability to force feeds to be downloaded and parsed, so that it can find articles that should be displayed again. |
PreFeedDownloadPlugIn | runPreFeedDownloadPlugIn() | Called before a feed is downloaded (actually, before a feed is checked to see if it has new data).
This method can return
false to signal curn that the feed should be
skipped. The plug-in method can also set values on the
URLConnection used to download the plug-in, via
URL.setRequestProperty(). (Note that all URLs, even
file: URLs, are passed into this method. Setting a request
property on the URLConnection object for a file:
URL will have no effect—though it isn't specifically harmful.)
Possible uses for a pre-feed download plug-in include:
|
PostFeedDownloadPlugIn | runPostFeedDownloadPlugIn() | Called immediately after a feed is downloaded. This method can return false to signal curn that the feed should be skipped. For instance, a plug-in that filters on the unparsed XML feed content could use this method to weed out non-matching feeds before they are downloaded. |
PostFeedParsePlugIn | runPostFeedParsePlugIn() | Called immediately after a feed is parsed, but before it is otherwise processed. A post-feed parse plug-in has access to the parsed RSS feed data, via an RSSChannel object. This method can return false to signal curn that the feed should be skipped. For instance, a plug-in that filters on the parsed feed data could use this method to weed out non-matching feeds before they are downloaded. Similarly, a plug-in that edits the parsed data (removing or editing individual items, for instance) could use method to do so. |
PostFeedProcessPlugIn | runPostFeedProcessPlugIn() | Called after a feed is parsed and processed. The plug-in has access to the parsed RSS feed data, via an RSSChannel object. This method can return false to signal curn that the feed should be skipped. |
PreFeedOutputPlugIn | runPreFeedOutputPlugIn() | Called immediately before a parsed feed is passed to an output handler. A pre-feed output plug-in cannot affect the feed's processing. (The time to stop the processing of a feed is in one of the other, preceding phases.) This method will be called multiple times for each feed if there are multiple output handlers. |
PostFeedOutputPlugIn | runPostFeedOutputPlugIn() | Called immediately after a parsed feed is passed to an output handler. A post-feed output plug-in cannot affect the feed's processing. (The time to stop the processing of a feed is in one of the other, preceding phases.) This method will be called multiple times for each feed if there are multiple output handlers. |
PostOutputHandlerFlushPlugIn | runPostOutputHandlerFlushPlugIn() | Called immediately after an output handler is flushed (i.e., after it has been called to process all feeds and its output has been written to a temporary file), but before that output is displayed, emailed, etc. |
PostOutputPlugIn | runPostOutputPlugIn() | Called after curn has flush all output handlers. A post-output plug-in is a useful place to consolidate the output from all output handlers. For instance, such a plug-in might pack all the output into a zip file, or email it. (The EmailOutputPlugIn works exactly this way.) |
PreCacheSavePlugIn | runPreCacheSavePlugIn() | Called right before the curn cache is to be saved. A plug-in might choose to edit the cache at this point. |
ShutdownPlugIn | runShutdownPlugIn() | Called just before curn gets ready to exit. This method allows plug-ins to perform any clean-up they require. |
All plug-ins within a given phase are run in a particular order. The internal curn plug-in manager sorts the plug-ins for each phase according to the value of each plug-in's sort key. Each plug-in must implement a getPlugInSortKey() method returning a string to be used when sorting that plug-in. For all curn stock plug-ins, that sort key is the plug-in's short class name (i.e., the class name without its package name). This is important only if one plug-in within a phase depends on another plug-in within the phase having run first. For instance, the Save As RSS plug-in runs during the post feed-parse phase, operating on the internal parsed feed data. It runs after the Ignore Duplicate Articles plug-in, because the Ignore Duplicate Articles plug-in's short class name (IgnoreDuplicateArticlesPlugIn) sorts before the Save As RSS plug-in's short class name (SaveAsRSSPlugIn). (This example assumes that both plug-ins are enabled, via their respective configuration parameters.) If the Save As RSS plug-in's short class name were something like CurnSaveAsRSSPlugIn, it would run before the Ignore Duplicate Articles plug-in, and the saved RSS file might have duplicate articles in it. Again, if you're writing your own plug-in, you care about this operational detail only if your plug-in depends on the other plug-ins within the same phase.
A plug-in class doesn't have to do anything special to register itself with curn. Merely implementing the appropriate interfaces is sufficient, as long as curn can find the plug-in class at run-time.
Of course, there's nothing quite like an example to clarify things. So, here's a simple plug-in that:
It uses a convenience class, org.clapper.util.io.Zipper, to simplify the chore of writing the zip file. Here's the source (with comments stripped out). Note that this is a stripped-down version of the actual ZipOutputPlugIn.
import org.clapper.curn.Curn; import org.clapper.curn.CurnConfig; import org.clapper.curn.CurnException; import org.clapper.curn.MainConfigItemPlugIn; import org.clapper.curn.OutputHandler; import org.clapper.curn.PostOutputPlugIn; import org.clapper.util.config.ConfigurationException; import org.clapper.util.logging.Logger; import org.clapper.util.io.Zipper; import org.clapper.util.text.TextUtil; import java.io.File; import java.io.IOException; import java.util.ArrayList; import java.util.Collection; public class ZipOutputPlugIn implements MainConfigItemPlugIn, PostOutputPlugIn { private static final String VAR_ZIP_FILE = "ZipOutputTo"; private File zipFile = null; private static Logger log = new Logger (ZipOutputPlugIn.class); public ZipOutputPlugIn() { } public String getPlugInName() { return "Zip Output"; } public String getPlugInSortKey() { return getPlugInName(); } public void initPlugIn() throws CurnException { } public void runMainConfigItemPlugIn (String sectionName, String paramName, CurnConfig config) throws CurnException { try { if (paramName.equals (VAR_ZIP_FILE)) { String zipFilePath = config.getConfigurationValue (sectionName, paramName); this.zipFile = new File (zipFilePath); } } catch (ConfigurationException ex) { throw new CurnException (ex); } } public void runPostOutputPlugIn (Collection<OutputHandler> outputHandlers) throws CurnException { if (zipFile != null) { log.debug ("Zipping output to \"" + zipFile.getPath() + "\""); zipOutput (outputHandlers); } } /*----------------------------------------------------------------------*\ Private Methods \*----------------------------------------------------------------------*/ private void zipOutput (Collection<OutputHandler> outputHandlers) throws CurnException { try { boolean haveFiles = false; // First, figure out whether we have any output or not. for (OutputHandler handler : outputHandlers) { if (handler.hasGeneratedOutput()) { haveFiles = true; break; } } if (! haveFiles) { // None of the handlers produced any output. log.error ("Warning: None of the output handlers " + "produced any zippable output."); } else { // Create the zip file. Zipper zipper = new Zipper (zipFile, /* flatten */ true); for (OutputHandler handler : outputHandlers) { File file = handler.getGeneratedOutput(); if (file != null) { log.debug ("Zipping \"" + file.getPath() + "\""); zipper.put (file); } } zipper.close(); } } catch (IOException ex) { throw new CurnException (ex); } } } |
To activate this plug-in, simply ensure that it is available to curn at startup, and add this configuration directive to the [curn] section of the configuration file:
# Unix ZipOutputTo: /tmp/curn.zip # Windows #ZipOutputTo: c:\\temp\\curn.zip |
For more examples, please see the source code for the stock plug-ins delivered with curn.
Beginning with curn 3.1, plug-ins (and output handlers, for that matter) can save runtime metadata to the curn metadata store (formerly called the "cache") and restore that data in a subsequent curn run. To be able to save and restore metadata, a plug-in must:
Once the plug-in is registered as a persistent data client, curn will:
The following code fragment shows how a plug-in might register itself.
import org.clapper.curn.DataPersister; import org.clapper.curn.DataPersisterFactory; import org.clapper.curn.AbstractPersistentDataClient; ... public class MyPlugIn extends AbstractPersistentDataClient, ... { ... public void initPlugIn() throws CurnException { DataPersister dataPersister = DataPersisterFactory.getInstance(); dataPersister.addPersistentDataClient(this); } ... }
Because FreeMarker relies on external templates to create the final document, and because curn makes it easy to use custom-crafted templates, you easily create your own custom documents, or your own branded HTML output, without having to write your own output handler. To create your own curn FreeMarker template, you must understand two things:
A tutorial on FreeMarker is beyond the scope of this document. The remainder of this section assumes you have some familiarity with FreeMarker. If you don't know how FreeMarker works, please consult the on-line FreeMarker documentation before reading this section.
A FreeMarker template relies on the presence of a tree of data, supplied by the program. FreeMarker calls this data tree a "data model". The FreeMarkerOutputHandler creates a curn-specific FreeMarker data model for use within a template. The FreeMarkerOutputHandler is as much as data-mapper as an output handler: It maps curn's internal RSS data structures into a FreeMarker data model, then invokes the FreeMarker template engine to transform the template and data-model into a document.
The curn FreeMarker data model is described below. The data model notation used here is similar to the notation used within the FreeMarker documentation.
Tree Description (root) | +-- curn | | | +-- showToolInfo ....................... [boolean] whether or not | | to display curn information | | in the output | | | +-- version ............................ [String] version of curn | | | +-- buildID ............................ [String] curn's build ID | +-- totalItems .............................. [int] total items for all channels | +-- dateGenerated ........................... [Date] date generated | +-- extraText ............................... [String] extra text, from the config | +-- encoding ................................ [String] encoding, from the config | +-- tableOfContents ......................... hash of table-of-contents data | | | +-- needed ............................. [boolean] whether a table of contents is needed | | | +-- (channels) ......................... sequence of channel table of contents items | | | +-- channel ...................... table of contents entry for one channel | | | | | +-- title .................. [String] channel title | | | | | +-- totalItems ............. [int] total items in channel | | | | | +-- channelAnchor .......... [String] HTML anchor name for channel | | | +-- channel ... ... | +-- (channels) .............................. sequence of channel (feed) data | | | +-- channel .......................... hash for a single channel (feed) | | | | | +-- index .................... [int] channel's index in list | | | | | +-- totalItems ............... [int] total items in channel | | | | | +-- title .................... [String] channel title | | | | | +-- description .............. [String] channel description, or "" if not available | | | | | +-- anchorName ............... [String] HTML anchor name for channel | | | | | +-- url ...................... [String] channel's URL (as published in the feed's XML) | | | | | +-- configuredURL ............ [String] channel/feed URL (as listed in the curn configuration file) | | | | | +-- id ....................... [String] channel's unique ID (which might just be the URL) | | | | | +-- date ..................... [Date] channel's last-modified date (might be missing) | | | | | +-- rssFormat ................ [String] RSS format of channel (Atom, RSS 0.92, etc.). | | | Empty if not to be shown. | | | | | +-- author ....... [String] the author or authors of the item, combined in a single string, or "" | | | | | +-- (authors) ................ sequence of (String) names of authors of the feed | | | | | +-- (items) .................. sequence of channel items | | | | | +-- item ............... entry for one item | | | | | | | +-- index ........ [int] item's index in channel | | | | | | | +-- title ........ [String] item's title | | | | | | | +-- url .......... [String] item's URL (as published in the feed's XML) | | | | | | | +-- date ......... [Date] the date (might be missing) | | | | | | | +-- author ....... [String] the author or authors of the item, combined in a single string, or "" | | | | | | | +-- authors ...... a sequence of individual (String) author names. Might be empty. | | | | | | | +-- description .. [String] description/summary | | | | | +-- item | | | ... ... ... | | | +-- channel | | ... ... |
The FreeMarkerOutputHandler also places three FreeMarker methods in the data model, as well:
Methods in the Data Model | |||
---|---|---|---|
Method Name | Explanation | Arguments | Examples |
wrapText | Wraps text at the end of the line, on word boundaries. Uses the org.clapper.util Java Utility Library's WordWrapWriter class. |
|
${wrapText (item.title, 4)} ${wrapText (item.description, 4, 50)} |
indentText | Indents the specified string. |
|
${indentText (item.url, 4)} ${indentText (channel.url, 8)} |
stripHTML | Strips all HTML tags from the specified string. Especially useful for plain text templates. |
|
${stripHTML (item.description)} |
escapeHTML | Escape special HTML characters in the specified string. For instance, "&" is converted to "&", "<" is converted to "<", etc. |
|
${escapeHTML (item.description)} |
Below is a sample template, which is largely identical to the built-in text template. This template illustrates the use of the data model. You can a version of the text template, as well as the HTML template and the simple summary template, by following these links:
Sample FreeMarker template |
---|
${title} <#if extraText != ""> ${wrapText (extraText)} </#if> <#list channels as channel> --------------------------------------------------------------------------- ${wrapText (channel.title, 0)} ${channel.url} <#if channel.date?exists> ${channel.date?string("E, dd MMM, yyyy 'at' HH:mm:ss")} </#if> <#list channel.items as item> ${wrapText (item.title, 4)} ${indentText (item.url, 4)} <#assign desc = stripHTML(item.description)> <#if desc != ""> ${wrapText (desc, 8)} </#if> </#list> </#list> --------------------------------------------------------------------------- <#if (curn.showToolInfo)> curn, ${curn.version} Generated ${dateGenerated?string("EEEEEE, dd MMMM, yyyy 'at' HH:mm:ss zzz")} </#if> |
There are two ways to write your own output handler: You can write a Java class that implements the output handler, or you can write a script using a scripting language supported by the Apache Jakarta Bean Scripting Framework (BSF). Both approaches are discussed below.
But—before you write your own output handler, consider whether you can accomplish the same ends by writing a FreeMarker template and using the FreeMarkerOutputHandler. If you're planning to create a different output file format (as opposed to writing an output handler to send data over a network connection or to a database), then there's a good chance that writing a FreeMarker template will be simpler and faster.
Writing a new output handler is reasonably straightforward:
To illustrate the concept, let's look at an output handler that simply writes each channel and its items as plain text. (As it turns out, this example is a stripped down version of the existing org.clapper.curn.output.TextOutputHandler class. The real source code has more comments and documentation and also implements a common base class.)
First, let's look at the top of the class and the required init() method:
import org.clapper.curn.CurnConfig; import org.clapper.curn.CurnException; import org.clapper.curn.ConfigureOutputHandler; import org.clapper.curn.OutputHandler; import org.clapper.curn.FeedInfo; import org.clapper.curn.parser.RSSChannel; import org.clapper.curn.parser.RSSItem; import org.clapper.util.io.WordWrapWriter; import org.clapper.util.text.TextUtil; import org.clapper.util.text.Unicode; import org.clapper.util.misc.Logger; import org.clapper.util.config.ConfigurationException; import org.clapper.util.config.NoSuchSectionException; import java.io.IOException; import java.io.InputStream; import java.io.FileInputStream; import java.io.FileWriter; import java.io.File; import java.io.FileNotFoundException; import java.util.Date; import java.util.ArrayList; import java.util.Collection; import java.util.Iterator; public class MyOutputHandler { private static final String HORIZONTAL_RULE = "---------------------------------------" + "---------------------------------------"; private WordWrapWriter out = null; private CurnConfig config = null; private String message = null; private Collection channels = new ArrayList(); private int totalItems = 0; private File outputFile = null; private boolean saveOnly = false; private static Logger log = new Logger (MyOutputHandler.class); public MyOutputHandler() { } /** * Initializes the output handler for another set of RSS channels. * * @param config the parsed curn configuration data * @param cfgHandler the ConfiguredOutputHandler wrapper object that * contains this object; the wrapper has some useful * metadata, such as the object's configuration section * name and extra variables. * * @throws ConfigurationException configuration error * @throws CurnException some other initialization error */ public void init (CurnConfig config, ConfiguredOutputHandler cfgHandler) throws ConfigurationException, CurnException { String sectionName = cfgHandler.getSectionName(); String saveAs = null; this.config = config; try { if (sectionName != null) { saveAs = config.getOptionalStringValue (sectionName, "SaveAs", null); saveOnly = config.getOptionalBooleanValue (sectionName, "SaveOnly", false); message = config.getOptionalStringValue (sectionName, "Message", null); if (saveOnly && (saveAs == null)) { throw new ConfigurationException (sectionName, "SaveOnly can only be " + "specified if SaveAs " + "is defined."); } } } catch (NoSuchSectionException ex) { throw new ConfigurationException (ex); } if (saveAs != null) outputFile = new File (saveAs); else { try { outputFile = File.createTempFile ("curn", null); outputFile.deleteOnExit(); } catch (IOException ex) { throw new CurnException ("Can't create temporary file."); } } try { log.debug ("Opening output file \"" + outputFile + "\""); out = new WordWrapWriter (new FileWriter (outputFile)); } catch (IOException ex) { throw new CurnException ("Can't open file \"" + outputFile.getPath() + "\" for output", ex); } channels.clear(); totalItems = 0; } |
curn calls the init() method right after instantiating the output handler class. One of the init() method's primary responsibilities is to handle any special handler-specific configuration variables. It does so by:
The init() method also performs any other initialization required by the output handler class.
Note: The sample init() method, above, does a little more work than it needs to do. As it turns out, the curn API provides a useful abstract base class called org.clapper.curn.output.FileOutputHandler that implements the OutputHandler interface. FileOutputHandler provides an init() method that:
FileOutputHandler requires that the subclass provide:
With that in mind, let's simplify our original init() method and class definition. Changes from the original, above, are marked in bold.
import org.clapper.curn.CurnConfig; import org.clapper.curn.CurnException; import org.clapper.curn.ConfigureOutputHandler; import org.clapper.curn.OutputHandler; import org.clapper.curn.FeedInfo; import org.clapper.curn.output.FileOutputHandler; import org.clapper.curn.parser.RSSChannel; import org.clapper.curn.parser.RSSItem; import org.clapper.util.io.WordWrapWriter; import org.clapper.util.text.TextUtil; import org.clapper.util.text.Unicode; import org.clapper.util.misc.Logger; import org.clapper.util.config.ConfigurationException; import org.clapper.util.config.NoSuchSectionException; import java.io.IOException; import java.io.InputStream; import java.io.FileInputStream; import java.io.FileWriter; import java.io.File; import java.io.FileNotFoundException; import java.util.Date; import java.util.ArrayList; import java.util.Collection; import java.util.Iterator; public class MyOutputHandler extends FileOutputHandler { private static final String HORIZONTAL_RULE = "---------------------------------------" + "---------------------------------------"; private WordWrapWriter out = null; private CurnConfig config = null; private String message = null; private Collection channels = new ArrayList(); private int totalItems = 0; private File outputFile = null; private boolean saveOnly = false; private static Logger log = new Logger (MyOutputHandler.class); public MyOutputHandler() { } /** * Perform any subclass-specific initialization. Subclasses must * override this method. * * @param config the parsed curn configuration data * @param cfgHandler the ConfiguredOutputHandler wrapper * containing this object; the wrapper has some useful * metadata, such as the object's configuration section * name and extra variables. * * @throws ConfigurationException configuration error * @throws CurnException some other initialization error */ public void initOutputHandler (CurnConfig config, ConfiguredOutputHandler cfgHandler) throws ConfigurationException, CurnException { String sectionName = cfgHandler.getSectionName(); this.config = config; try { if (sectionName != null) { // Only need to handle the "Message" parameter. The // FileOutputHandler parent class handles "SaveAs" and // "SaveOnly" message = config.getOptionalStringValue (sectionName, "Message", null); } } catch (NoSuchSectionException ex) { throw new ConfigurationException (ex); } outputFile = super.getOutputFile() try { log.debug ("Opening output file \"" + outputFile + "\""); out = new WordWrapWriter (new FileWriter (outputFile)); } catch (IOException ex) { throw new CurnException ("Can't open file \"" + outputFile.getPath() + "\" for output", ex); } channels.clear(); totalItems = 0; } |
Next, let's look at the output-related methods. There are several output-related methods that are required by the OutputHandler interface:
Here's the code for the methods our sample class must implement:
public void displayChannel (RSSChannel channel, FeedInfo feedInfo) throws CurnException { Collection items = channel.getItems(); indentLevel = setIndent (0); if ((items.size() != 0) || (! config.beQuiet())) { // Emit a site (channel) header. out.println(); out.println (HORIZONTAL_RULE); out.println (convert (channel.getTitle())); out.println (channel.getLink().toString()); Date date = channel.getPublicationDate(); if (date != null) out.println (date.toString()); if (config.showRSSVersion()) out.println ("(Format: " + channel.getRSSFormat() + ")"); } if (items.size() != 0) { // Now, process each item. String s; for (Iterator it = items.iterator(); it.hasNext(); ) { RSSItem item = (RSSItem) it.next(); setIndent (++indentLevel); out.println (); s = item.getTitle(); out.println ((s == null) ? "(No Title)" : convert (s)); s = item.getAuthor(); if (s != null) out.println ("By " + convert (s)); out.println (item.getLink().toString()); Date date = item.getPublicationDate(); if (date != null) out.println (date.toString()); s = item.getSummary(); if (TextUtil.stringIsEmpty (s)) { // Hack for feeds that have no summary but have // content. If the content is small enough, use it // as the summary. s = item.getFirstContentOfType (new String[] { "text/plain", "text/html" }); if (! TextUtil.stringIsEmpty (s)) { s = s.trim(); if (s.length() > CONTENT_AS_SUMMARY_MAXSIZE) s = null; } else { s = s.trim(); } if (s != null) { out.println(); setIndent (++indentLevel); out.println (convert (s)); setIndent (--indentLevel); } } setIndent (--indentLevel); } } else { if (! config.beQuiet()) { setIndent (++indentLevel); out.println (); out.println ("No new items"); setIndent (--indentLevel); } } setIndent (0); } private int setIndent (int level) { StringBuffer buf = new StringBuffer(); for (int i = 0; i < level; i++) buf.append (" "); out.setPrefix (buf.toString()); return level; } public void flush() throws CurnException { out.println (); out.println (HORIZONTAL_RULE); out.println ("curn, version " + Version.VERSION); out.println ("Generated " + new Date().toString()); out.flush(); out = null; } public String getContentType() { return "text/plain"; } |
This particular output handler's displayChannel() method summarizes the channel and item data, writing it to the output file that the init() method opened. The flush() method simply finishes the display.
With this model, it's possible to create output handlers that produce all kinds of output, including (for instance):
Writing a script output handler is even simpler, in a way, than writing a Java output handler. You simply write a script in a supported language, then configure an instance of the ScriptOutputHandler class to point to your script.
The ScriptOutputHandler uses the Apache Jakarta Bean Scripting Framework (BSF) or the JSR 223 scripting engine to call scripts. It currently supports any scripting language that has a binding to either scripting infrastructure. See the configuration section for the ScriptOutputHandler class for the list of sample languages.
Note: curn comes bundled with a compatible version of the BSF bsf.jar file. If you're running Java 6, and you want JSR 223 support for languages other than Javascript, please see https://scripting.dev.java.net/.
The ScriptOutputHandler class's displayChannel() method doesn't actually generated any output. Instead, it buffers the channels so that the flush() method can invoke the script. That way, the overhead of invoking the script occurs only once.
The ScriptOutputHandler object exposes a special curn object to the invoked script; that object contains the following fields and methods, all of which are available to the script. The curn object is exposed via BSFManager.declareBean(), which means it is a global variable that is automatically accessible to the script, without the need for the script to call any methods to find it.
curn field or method name | Corresponding "registered" BSF bean (for backward compatibility) | Java type | Explanation |
---|---|---|---|
curn.channels | channels | java.util.Collection | An Collection of special internal
objects that wrap both
RSSChannel
and
FeedInfo
objects. The wrapper objects provide two methods:
|
curn.outputPath | outputPath | java.lang.String | The path to the output file. The script should write its output to that file. Overwriting the file is fine. If the script generates no output, then it can ignore the file. |
curn.config | config | CurnConfig | The org.clapper.curn.CurnConfig object that represents the parsed configuration data. Useful in conjunction with the "configSection" object, to parse additional parameters from the configuration. |
curn.configSection | configSection | java.lang.String | The name of the configuration file section in which the output handler was defined. Useful if the script wants to access additional script-specific configuration data. |
curn.setMIMEType() | The script should call this method and pass it the MIME type that corresponds to the generated output. If the script generates no output, then it can ignore this method. | ||
mimeType | java.lang.PrintWriter | A PrintWriter object to which the script should print the MIME type that corresponds to the generated output. If the script generates no output, then it can ignore this object. | |
curn.logger | logger | org.clapper.util.misc.Logger | A Logger object, useful for logging messages to the curn log file. |
version | java.lang.String | Full curn version string, in case the script wants to include it in the generated output | |
curn.getVersion() | java.lang.String | Method that returns the full curn version string, in case the script wants to include it in the generated output |
Here's a sample Jython script that shows how to put it all together. This script reimplements most of the functionality of the org.clapper.curn.output.TextOutputHandler Java class that comes with curn. (Note that the script uses a org.clapper.util.io.WordWrapWriter object for its output. While the word-wrapping functionality could have been implemented directly in Jython, this strategy both saves time and demonstrates how easily you can use existing Java classes from a Jython script.)
Class documentation, copyrights, etc., have been stripped from the script for brevity. You can find the complete script, along with a JRuby implementation of the same functionality, in the curn source bundle, in directory src/org/clapper/curn/output/script.
import sys from org.clapper.curn import CurnException from org.clapper.util.io import WordWrapWriter HORIZONTAL_RULE = "---------------------------------------" \ + "---------------------------------------" def process_channels(): """ Process the channels passed in through the Bean Scripting Framework. """ # If we didn't care about wrapping the output, we'd just use: # # out = open (self.outputPath, "w") # # But it'd be nice to wrap long summaries on word boundaries at # the end of an 80-character line. For that reason, we use the # Java org.clapper.util.io.WordWrapWriter class. out = WordWrapWriter (open (curn.outputPath, "w")) out.setPrefix ("") msg = curn.config.getOptionalStringValue (curn.configSection, "Message", None) totalNew = 0 # First, count the total number of new items for channel_wrapper in curn.channels: channel = channel_wrapper.getChannel() totalNew = totalNew + channel.getItems().size() if totalNew > 0: # If the config file specifies a message for this handler, # display it. if msg != None: out.println (msg) out.println () # Now, process the items indentation = 0 for channel_wrapper in curn.channels: channel = channel_wrapper.getChannel() channel = channel_wrapper.getChannel() feed_info = channel_wrapper.getFeedInfo() process_channel (out, channel, feed_info, indentation) curn.setMIMEType ("text/plain") # Output a footer indent (out, indentation) out.println () out.println (HORIZONTAL_RULE) out.println (curn.getVersion()) out.flush() def process_channel (out, channel, feed_info, indentation): """ Process all items within a channel. """ curn.logger.debug ("Processing channel \"" + str (channel.getTitle()) + "\"") # Print a channel header indent (out, indentation) out.println (HORIZONTAL_RULE) out.println (channel.getTitle()) out.println (channel.getLinks()[0].toString()) out.println (str (channel.getItems().size()) + " item(s)") date = channel.getPublicationDate() if date != None: out.println (str (date)) if curn.config.showRSSVersion(): out.println ("(Format: " + channel.getRSSFormat() + ")") indentation = indentation + 1 indent (out, indentation) for item in channel.getItems(): # These are RSSItem objects out.println() out.println (item.getTitle()) out.println (str (item.getLinks()[0])) date = item.getPublicationDate(); if date != None: out.println (str (date)) out.println() summary = item.getSummary() if summary != None: indent (out, indentation + 1) out.println (summary) indent (out, indentation) def indent (out, indentation): """ Apply a level of indentation to a WordWrapWriter, by changing the WordWrapWriter's prefix string. out - the org.clapper.util.io.WordWrapWriter indentation - the numeric indentation level """ prefix = "" for i in range (indentation): prefix = prefix + " " out.setPrefix (prefix) # --------------------------------------------------------------------------- process_channels() |
The configuration section for the ScriptOutputHandler class provides a detailed description of the configuration parameters. Here is a sample configuration entry for our TextOutputHandler.py Jython script.
[OutputHandlerJythonScript] Class: org.clapper.curn.output.script.ScriptOutputHandler #SaveAs: ${system:user.home}/curn/rss-py.txt #SaveOnly: true Language: jython Script: ${system:user.home}/curn/TextOutputHandler.py Message: Copy saved in file ${system:user.home}/curn/rss-py.txt |
As noted in the Overview of Plug-In Support section, curn searches for plug-ins in the following directories:
curn implicitly adds those directories to the internal class path used by the curn custom class loader. curn also loads the following directories into its class loader:
If you've written or installed a plug-in, output handler or RSS parser adapter that requires some third-party support software (e.g., a third-party RSS parser engine), or if you want to enable logging using a non-bundled logging framework such as Log4J), you'll have to install the appropriate support jars somewhere where curn can find them. If you install the jars in the lib directory underneath the curn installation directory, then they'll be available to any user who runs that curn installation. However, if you don't have permission to update that directory, or you only want to make your extensions available to you, then you can install your software in the appropriate lib directory under your home directory.
In the event of problems, your first step should be to enable logging. The next section discusses curn's logging infrastructure.
curn issues log messages via the Jakarta Commons Logging API (JCL), so it'll log to any JCL-compatible logging framework. The two most popular frameworks are Log4J and the java.util.logging framework that comes with the Java JDK or J2SE runtime.
curn's graphical installer automatically installs the JCL jars, but it does not install Log4J (or any other third-party logging framework); so, by default, curn will log via the java.util.logging API.
When initialized at runtime, some underlying logging frameworks will automatically begin logging if they find an appropriate configuration file in some default location. To prevent this behavior, curn does not initialize the JCL layer unless the --logging command-line parameter is specified. If --logging is not specified, curn will not issue log messages even if default logging configuration files are present.
All curn Java classes are in packages within the org.clapper.curn namespace.
Each logging framework has its own initialization files; the following two sections show how you might enable logging for the two more popular logging frameworks, java.util.logging and Log4J.
Note: Please be aware that, as of curn 3.0, there are some subtle "issues" with JDK logging. If you happened to be using the org.clapper.util.logging.JavaUtilLoggingTextFormatter class to format your output, you're out of luck. In general, you're far better off using Log4J. The gory details, for those who care about such things, follow.
curn now uses a special bootstrap mechanism to enable the use of plug-ins. As part of this bootstrap logic, curn installs its own class loader. However, the built-in JDK logging API doesn't play well in that environment. It always uses the system class loader—the CLASSPATH-driven class loader—to find its classes. Normally, this isn't too much of a problem, but curn doesn't rely on CLASSPATH to find its code; instead, it uses its own class loader. Among the jar files curn's class loader searches is the org.clapper.util utility library, the very library that contains the JavaUtilLoggingTextFormatter class. If you try to ensure that the org.clapper.util utility library is in the CLASSPATH (and, therefore, available to the JDK logging API), you run the risk of causing problems with curn's runtime environment.
Further, any third-party formatters you are using (such as the SMTPHandler formatter), you have to ensure that the jar files containing those formatters are listed in the classpath. To do that, you'll have to modify the shell script or Windows command file used to invoke curn.
If you're not using the JavaUtilLoggingTextFormatter class in the org.clapper.util library, and you're not using any third-party formatters, then this warning probably doesn't apply to you.
This section assumes that you're using a properties file to configure the logging framework; if you're using a custom logging configuration class, you'll have to work out the configuration details yourself.
When you use the java.util.logging framework with curn, you must specify the location of the logging configuration file with system property java.util.logging.config.file. file. According to the Javadoc for the LogManager class, the property may be set via the Preferences API or as a command-line property definition passed to the java command.
To invoke curn on a Unix-like system, so that it logs through the java.util.logging framework, you might use a command line like this:
java -Djava.util.logging.config.file=/home/bmc/curn/logging.properties org.clapper.curn.Tool --logging /home/bmc/curn/curn.cfg
The curn shell script installed by the graphical installer understands -D parameters; if you use the shell script, you can shorten the above command to:
curn -Djava.util.logging.config.file=/home/bmc/curn/logging.properties --logging /home/bmc/curn/curn.cfg
Alternatively, you can set those values in the CURN_JAVA_VM_ARGS environment variable, and curn will automatically supply them to the Java virtual machine. For example:
export CURN_JAVA_VM_ARGS="-Djava.util.logging.config.file=/home/bmc/curn/logging.properties" curn --logging /home/bmc/.curn/curn.cfg
The Windows curn.bat command file currently does not understand -D parameters, so you must use the CURN_JAVA_VM_ARGS environment variable method. For instance:
set CURN_JAVA_VM_ARGS=-Djava.util.logging.config.file=/home/bmc/curn/logging.properties curn --logging %HOME%\.curn\curn.cfg
Here's a configuration file that writes logs all messages at the "info" level or lower, to a file. (Change "INFO" to "FINEST" to get messages at the debug level.)
handlers=java.util.logging.FileHandler .level=FINEST # %h is replaced with the user's home directory java.util.logging.FileHandler.pattern = %h/curn/log.out java.util.logging.FileHandler.level=FINEST java.util.logging.FileHandler.count = 1 java.util.logging.FileHandler.formatter = java.util.logging.SimpleFormatter org.clapper.curn.level=INFO |
If you want to have exceptions mailed to you (which can be useful when running curn from a scheduler, such as cron(8)), then download and install the SMTPHandler class from smtphandler.sourceforge.net, and use this configuration file, instead:
handlers=java.util.logging.FileHandler .level=FINEST # %h is replaced with the user's home directory java.util.logging.FileHandler.pattern = %h/curn/log.out java.util.logging.FileHandler.level=FINEST java.util.logging.FileHandler.count = 1 java.util.logging.FileHandler.formatter = java.util.logging.SimpleFormatter smtphandler.SMTPHandler.level=WARNING smtphandler.SMTPHandler.smtpHost=your_smtp_host_here smtphandler.SMTPHandler.to=your_email_address_here smtphandler.SMTPHandler.from=your_email_address_here smtphandler.SMTPHandler.subject=[SMTPHandler] curn error smtphandler.SMTPHandler.bufferSize=4096 smtphandler.SMTPHandler.formatter=smtphandler.SMTPHandler org.clapper.curn.level=INFO |
Note that both files use the org.clapper.util.misc.JDK14TextLogFormatter formatter class, instead of the JDK-supplied java.util.logging.SimpleFormatter. I don't care for the text format that SimpleFormatter produces, so I use a formatter in from utility library that produces output that's similar to the default Log4J text formatter.
Obviously, you can use any formatter you wish, including the java.util.logging.XMLFormatter class.
Before you can use the Log4J framework with curn, you must install the log4j.jar file so that curn can find it. (You can download that file from http://logging.apache.org/log4j/.) See the section entitled Installing Support Software for details on where to install log4j.jar.
When you use the Log4J framework with curn, you must specify the location of the logging configuration file using system property log4j.configuration Unlike the java.util.logging framework, the argument to log4j.configuration is not a pathname; it's URL.
To invoke curn on a Unix-like system, so that it logs through the Log4J framework, you might use a command line like this:
java -Dlog4j.configuration=file:///home/bmc/curn/logging.properties org.clapper.curn.Tool --logging /home/bmc/curn/curn.cfg
The curn shell script installed by the graphical installer understands -D parameters; if you use the shell script, you can shorten the above command to:
curn -Djava.util.logging.config.file=/home/bmc/curn/logging.properties --logging /home/bmc/curn/curn.cfg
Alternatively, you can set those values in the CURN_JAVA_VM_ARGS environment variable, and curn will automatically supply them to the Java virtual machine. For example:
export CURN_JAVA_VM_ARGS="-Djava.util.logging.config.file=/home/bmc/curn/logging.properties" curn --logging /home/bmc/.curn/curn.cfg
The Windows curn.bat command file currently does not understand -D parameters, so you must use the CURN_JAVA_VM_ARGS environment variable method. For instance:
set CURN_JAVA_VM_ARGS=-Djava.util.logging.config.file=/home/bmc/curn/logging.properties curn --logging %HOME%\.curn\curn.cfg
Here's a configuration file that writes logs all messages at the "info" level or lower, to a file. (Change "info" to "debug" to get messages at the debug level.)
log4j.rootLogger=info, File log4j.appender.File=org.apache.log4j.FileAppender log4j.appender.File.layout=org.apache.log4j.PatternLayout log4j.appender.File.file=${user.home}/curn/log.out # Overwrite the file each time log4j.appender.File.append=false # Print the date in ISO 8601 format log4j.appender.File.layout.ConversionPattern=%d %-5p (%c{1}): %m%n log4j.logger.org.clapper.curn=info |
If you want to have exceptions mailed to you (which can be useful when running curn from a scheduler, such as cron(8)), then you have to use the Log4J LevelRangeFilter class to filter the messages going to individual Log4J appenders. You can't use a properties-based configuration file in this case, because Log4J's properties file configurator doesn't support filters. Instead, you must use an XML configuration file, such as the one shown below.
<?xml version="1.0" encoding="UTF-8" ?> <!DOCTYPE log4j:configuration SYSTEM "log4j.dtd"> <log4j:configuration xmlns:log4j="http://jakarta.apache.org/log4j/"> <appender name="logfile" class="org.apache.log4j.FileAppender"> <param name="File" value="${user.home}/.curn/log.out"/> <layout class="org.apache.log4j.PatternLayout"> <param name="ConversionPattern" value="=%d %-5p (%c{1}): %m%n"/> </layout> </appender> <appender name="mail" class="org.apache.log4j.net.SMTPAppender"> <param name="BufferSize" value="4096"/> <param name="From" value="bmc@clapper.org"/> <param name="To" value="bmc@clapper.org"/> <param name="Subject" value="curn log message"/> <param name="SMTPHost" value="mail.inside.clapper.org"/> <layout class="org.apache.log4j.PatternLayout"> <param name="ConversionPattern" value="=%d %-5p (%c{1}): %m%n"/> </layout> <filter class="org.apache.log4j.varia.LevelRangeFilter"> <param name="LevelMin" value="error"/> <param name="AcceptOnMatch" value="true"/> </filter> </appender> <logger name="org.clapper.curn"> <level value="debug"/> <appender-ref ref="mail"/> <appender-ref ref="logfile"/> </logger> <root> <level value="fatal"/> <appender-ref ref="logfile"/> <appender-ref ref="mail"/> </root> </log4j:configuration> |
Note that both files use the org.clapper.util.misc.JDK14TextLogFormatter formatter class, instead of the JDK-supplied java.util.logging.SimpleFormatter. I don't care for the text format that SimpleFormatter produces, so I use a formatter in from utility library that produces output that's similar to the default Log4J text formatter.
Obviously, you can use any formatter you wish, including the java.util.logging.XMLFormatter class.
Adam Sampson's rawdog (RSS Aggregator Without Delusions Of Grandeur) is similar in spirit, features, and invocation. It's written in Python. Like curn, rawdog is intended to be run from a scheduler such as cron.
curn is copyright © 2004-2010 Brian M. Clapper. All rights reserved. This software is licensed under a BSD-style license.
curn uses software (Freemarker) developed by the Visigoth Software Society (http://www.visigoths.org/).It is licensed under a BSD-style license.
curn uses the ASM Bytecode Manipulation Library, which is copyright © 2000-2005 INRIA, France Telecome. All rights reserved.
curn uses the Java Mail API and the Java Beans Activation Framework, which are copyright © Sun Microsystems, Inc.
curn uses the Apache Jakarta Bean Scripting Framework (BSF), the Jakarta Commons Logging API, and the Apache Xerces XML parser API. All are copyright © The Apache Software Foundation.
curn is integrated with the ROME RSS parser (though it does not use that parser by default). ROME is copyright © 2004 Sun Microsystems, Inc and is licensed under the Apache License, Version 2.0.
[1] Normally, adding attachments to an email message results in a "multipart/mixed" message. According to RFC 1341, the differences between "multipart/mixed" and "multipart/alternative" messages are:
multipart/mixed | Intended for use when the body parts (i.e., the text and the attachments) are independent and intended to be displayed serially. For example, to send a text message with an attached image, you would use a "multipart/mixed" message. |
multipart/alternative | Each of the parts (i.e., the main text part and the attachments) is an alternative version of the same information. The most typical "multipart/alternative" message contains a plain text part (i.e., MIME type "text/plain") and an HTML text part. Both contain the same text, but the HTML part has a "richer" version of it. The recipient's mail client should either display the "best" version of the message, based on the user's environments and preferences; or, it should offer the user a choice of which part to view. A mail reader that's capable of displaying HTML might choose to ignore the plain text part and display only the HTML attachment; by contrast, a mail reader that cannot render HTML might choose to display only the plain text part. |
[2] curn's configuration file is parsed by a subclass of the org.clapper.util.config.Configuration class, so it implicitly supports all the features provided by that class.
[3] It's also possible, though hairy, to escape the special meaning of special characters via the backslash character. For instance, you can escape the variable substitution lead-in character, '$', with a backslash. e.g., "\$". This technique is not recommended, however, because you have to double-escape any backslash characters that you want to be preserved literally. For instance, to get "\t", you must specify "\\\\t". To get a literal backslash, specify "\\\\". (Yes, that's four backslashes, just to get a single unescaped one.) This double-escaping is a regrettable side effect of how the configuration file parses variable values: It makes two separate passes over the value, one for metacharacter expansion and another for variable expansion. Each of those passes honors and processes backslash escapes. This problem would go away if the configuration file parser parsed both metacharacter sequences and variable substitutions itself, in one pass. It doesn't currently do that, because I wanted to make use of the existing org.clapper.util.text.XStringBuffer class's decodeMetacharacters() method and the org.clapper.util.text.UnixShellVariableSubstituter class. In general, you're better off just sticking with single quotes. I may eventually fix this problem, but single quotes work now and will continue to work regardless.
$Id$