Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

mediawiki-utilities/python-mwxml

Open more actions menu

Repository files navigation

MediaWiki XML

This library contains a collection of utilities for efficiently processing MediaWiki’s XML database dumps. There are two important concerns that this module intends to address: complexity and performance of streaming XML parsing. This library enables memory efficent stream processing of XML dumps with a simple iterator strategy. This library also implements a distributed processing strategy (see map()) that enables parallel processing of many XML dump files at the same time.

Example

>>> import mwxml
>>>
>>> dump = mwxml.Dump.from_file(open("dump.xml"))
>>> print(dump.site_info.name, dump.site_info.dbname)
Wikipedia enwiki
>>>
>>> for page in dump:
...     for revision in page:
...        print(revision.id)
...
1
2
3

Author

See also

About

A set of utilities for processing MediaWiki XML dump data.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 
Morty Proxy This is a proxified and sanitized view of the page, visit original site.