markdown.htmlparser
¶This module imports a copy of html.parser.HTMLParser and modifies it heavily through monkey-patches.
A copy is imported rather than the module being directly imported as this ensures that the user can import
and use the unmodified library for their own needs.
Classes:
HTMLExtractor
–
Extract raw HTML from text.
markdown.htmlparser.HTMLExtractor(md: Markdown, *args, **kwargs)
¶
Bases: HTMLParser
Extract raw HTML from text.
The raw HTML is stored in the htmlStash of the
Markdown instance passed to md and the remaining text
is stored in cleandoc as a list of strings.
Methods:
reset
–
Reset this instance. Loses all unprocessed data.
close
–
Handle any buffered data.
at_line_start
–
Returns True if current position is at start of line.
get_endtag_text
–
Returns the text of the end tag.
handle_empty_tag
–
Handle empty tags (<data>).
get_starttag_text
–
Return full source of start tag: <...>.
Attributes:
line_offset
(int)
–
Returns char index in self.rawdata for the start of the current line.
markdown.htmlparser.HTMLExtractor.line_offset: int
property
¶Returns char index in self.rawdata for the start of the current line.
markdown.htmlparser.HTMLExtractor.at_line_start() -> bool
¶Returns True if current position is at start of line.
Allows for up to three blank spaces at start of line.
markdown.htmlparser.HTMLExtractor.get_endtag_text(tag: str) -> str
¶Returns the text of the end tag.
If it fails to extract the actual text from the raw data, it builds a closing tag with tag.