A collection of tests for IRC bots.

Webpage Title Fetching Tests

A collection of tests for title-fetching plugins commonly found in IRC bots.

HTML

A collection of page titles that may cause issues for naive title-fetching code.

Basic Title

A regular title, as a control. If a bot can't parse this, then something's seriously wrong.

Long Title

A very long title to test for truncation.

Mimic Title

Tests if the bot actually parses HTML instead of pattern matching.

HTML Entity Title

A title containing HTML entities.

HTML Entity Naive Fix Title

A title containing HTML entities that produce an empty title if escaped too early, instead of after parsing.

HTML Unicode Entity Title

A title containing unicode HTML entities. Try the Encoding tests for more complete unicode handling tests.

IRC

A collection of page titles that may cause issues for naive title-fetching plugins. While most of these shouldn't affect more than the displayed page title, they may be annoying to you or other channel users.

ANSI Escape Sequence

A title containing ANSI escape colour codes. This shouldn't do any damage, but may mess with the output of a connected terminal.

Bell Character

A title containing the bell character, which will cause many clients to emit a noise.

IRC Formatting Codes

A title containing IRC formatting characters, which may interfere with the bot's own formatting.

IRC CTCP Action

A title containing a CTCP ACTION message, which is how /me works.

IRC CTCP Version

A title containing a CTCP VERSION message.

IRC CRLF Quit

A title containing a \r\n, which signifies the end of an IRC command, followed by a QUIT command. If a bot is doing naive string concatenation, this may result in the bot quiting IRC.

HTTP

A collection of responses that may causes issues for basic HTTP clients. These are all examples of things you may run across online, or that someone malicious may attempt to break/inconvenience your bot with.

Slow Response

A regular title, eventually. Simulates a painfully slow connection.

Never-ending Response

A response that never actually ends, just keeps streaming content. (Ok, it ends after about a day, but that's more than enough time to realise there's a problem.)

Redirect-self Loop

Redirects to itself, forever.

Redirect-unique Loop

Redirects to another URL that redirects to another URL that... forever. Browsers protect against this by having an upper limit on the number of redirects (usually around 30).

Redirect Localhost

Redirects to localhost.

Encoding

A collection of responses with various encodings and metadata. They start off entirely standard and shouldn't cause difficulty for anyone, but increase in difficulty. While a lot of these are just straight up wrong, they are all taken from real-life examples. If your bot correctly gets most of them, you should be fine for ~99% of the sites out there.

UTF-8

A UTF-8 encoded page containing a title with emoji and Japanese characters.

Shift JIS

A Shift JIS encoded page containing a title with emoji and Japanese characters.

Shift JIS (no charset header)

A Shift JIS encoded page containing a title with emoji and Japanese characters, but without a charset specified in the Content-type header.

Shift JIS (no charset header, meta charset tag)

A Shift JIS encoded page containing a title with emoji and Japanese characters, without a charset specified in the Content-type header, but with a meta charset HTML tag.

Shift JIS (no charset header, meta http-equiv tag)

A Shift JIS encoded page containing a title with emoji and Japanese characters, without a charset specified in the Content-type header, but with a meta http-equiv HTML tag.

Shift JIS (incorrect header)

A Shift JIS encoded page containing a title with emoji and Japanese characters, with an incorrect charset specified in the Content-type header, but with correct meta charset and meta http-equiv HTML tags. This is high level magic to get correct, so don't feel too bad if this one breaks.

Shift JIS (incorrect header and meta tags)

A Shift JIS encoded page containing a title with emoji and Japanese characters, with an incorrect charset specified in the Content-type header and meta charset/http-equiv HTML tags. This one also requires arcane magic to get correct.

Miscellaneous

A collection of title-fetching tests that don't really fit into any other category.

Client Info

A title containing some basic info about the client. There isn't really anything to test or protect against here, it's just something that you should remember is possible.

Whitespace

A title containing a lot of whitespace.

Image

An image that happens to contain an HTML-looking title. This image is tiny, but if it fetches this, it likely fetches every file, potentially including very large ones.