MarkupToken is a bundle of Delphi classes for parsing files written in markup languages like HTML or XML. The main class TTagParser splits a file into tags and non-tags. You can use it for example to extract the text from HTML documents or collect links to spider the web.
Status: Freeware; modify and use it at your own risk. I'd appreciate it if you left a comment in the source files about the original author, Joachim Pimiskern.
|TCharStream||Reads a textfile into the RAM and provides access to it via a function nextChar() which realizes a stream of integer numbers (0..255), -1 = EOF.|
|TMarkupTokenType||Aggregation type of characters to bigger tokens. For example, "abc" is from this point of view not only a doublequote, followed by 'a','b', 'c', and a final doublequote, but rather simply a tt_string.|
|TMarkupTokenizer||Gets a TCharStream object as input and generates a stream of TMarkupTokenType which is accessible via the public variables TokenType and TokenString. For example, "abc" would result in TokenType = tt_string and TokenString = abc|
|TTagTokenType||That's the highest level of abstraction. A file in a markup language consists of tags, non-tags, and comments.|
|TTagToken||Holds the information for each occurrence of TTagTokentype. It has the key entries TokenType of type TTagTokenType, TokenString, which contains the actual value, and Data, which is a hashtable. Data has the special entries tagname and tagtype. For example, <a href=http://www.google.de"> would lead to a Data hashtable with "tagname" -> "a", "tagtype" -> "begin", "href" -> "http://www.google.de". Ending tags have "tagtype" -> "end"|
|TTagParser||Gets a stream of TMarkupTokenType as input and fills a property Tokens with elements of type TTagTokenType. Tokens is a list. The normal way to deal with it is to iterate through the list, access the items Tokens[i] and evaluate TokenType and Data.|
|THtmlParts||A class that provides convenient access to parts of a HTML file like the links, head, body, title, meta-tags.|
|TFileWithVariables||Reads a file of type TCharStream and translates every variable form of $identifier. For each variable, the event OnTranslateVariable is triggered so the the application can replace the variable by some other text. The expanded file is yielded by the function Expland().|