Python 1.2-beta-* has a whole suite of WWW modules, amongst which a
primitive but extensible HTML parser. Have a look at the docs, e.g.
http://www.cwi.nl/~guido/Python.html
Follow the links to Library Reference and select the chapter on
Internet and WWW.
For the HTML parser you may have to read the source to understand how
to extend it.
--Guido van Rossum, CWI, Amsterdam <mailto:Guido.van.Rossum@cwi.nl>
<http://www.cwi.nl/cwi/people/Guido.van.Rossum.html>