These classes form the basis of iterative processing of XML dumps. These datatypes are based on those found in http://pythonhosted.org/mwtypes
mwxml.Dump(site_info, items)[source]¶XML Dump Iterator. Dump file meta data and a
Page iterator. Instances of this class can
be called as an iterator directly. Usually, you’ll want to construct this
class using from_file().
| Parameters: |
SiteInfo |
|---|---|
| Example: | from mwxml import Dump, Page
# Construct dump file iterator
dump = Dump.from_file(open("example/dump.xml"))
# Iterate through pages
for page in dump.pages:
# Iterate through a page's revisions
for revision in page:
print(revision.id)
|
| Attributes: |
|
from_file(f)[source]¶Constructs a Dump from a file pointer.
| Parameters: |
|
|---|
from_page_xml(page_xml)[source]¶Constructs a Dump from a <page> block.
| Parameters: |
|
|---|
itemsAn iterator of mwxml.Page and/or
mwxml.LogItem elements
log_itemsAn iterator of mwxml.LogItem elements
pagesAn iterator of mwxml.Page elements
site_infoMetadata from the <siteinfo> tag :
SiteInfo
mwxml.SiteInfo(*args, **kwargs)[source]¶Represents the data from the <siteinfo> in a MediaWiki XML dump.
name = The name of the site. : str | None¶dbname = The database name of the site. : str | None¶base = TODO: ??? : str | None¶generator = TODO: ??? : str | None¶case = TODO: ??? : str | None¶namespaces = list(mwxml.Namespace) | None¶mwxml.Page(*args, **kwargs)[source]¶Page meta data and a Revision iterator. Instances of
this class can be called as iterators directly. See mwtypes.Page
for a description of fields.
| Example: | page = mwxml.Page( ... )
for revision in page:
print("{0} {1}".format(revision.id, page.id))
|
|---|
mwxml.LogItem(*args, **kwargs)[source]¶LogItem meta data. See mwtypes.LogItem
for a description of fields.
| Example: | dump = mwxml.Dump( ... )
for log_item in dump.log_items:
print("{0} {1}".format(log_item.id, log_item.type))
|
|---|
Deleted(*args, **kwargs)¶Represents information about the deleted/suppressed status of a log item and it’s associated data.
| Attributes: |
|
|---|
from_int(integer)¶Constructs a Deleted using the tinyint value of the log_deleted column of the logging MariaDB table.
mwxml.Revision(*args, **kwargs)[source]¶Revision metadata and text. See mwtypes.Revision for a
description of fields.
Deleted(*args, **kwargs)¶Represents information about the deleted/suppressed status of a revision and it’s associated data.
| Attributes: |
|
|---|
from_int(integer)¶Constructs a Deleted using the tinyint value of the rev_deleted column of the revision MariaDB table.
mwxml.Namespace(*args, **kwargs)[source]¶See mwtypes.Namespace for a description of fields