From fb31305b1972777cbd957a9a00cbde9754084252 Mon Sep 17 00:00:00 2001 From: Google Code Exporter Date: Wed, 15 Apr 2015 11:00:37 -0400 Subject: [PATCH] Migrating wiki contents from Google Code --- Documentation.md | 185 ++++++++++++++++++++++++++++++++++++++++++++ EarlyApiRedesign.md | 110 ++++++++++++++++++++++++++ Installation.md | 32 ++++++++ ProjectHome.md | 97 +++++++++++++++++++++++ 4 files changed, 424 insertions(+) create mode 100644 Documentation.md create mode 100644 EarlyApiRedesign.md create mode 100644 Installation.md create mode 100644 ProjectHome.md diff --git a/Documentation.md b/Documentation.md new file mode 100644 index 0000000..a9755df --- /dev/null +++ b/Documentation.md @@ -0,0 +1,185 @@ +# **work in progress** # + +# What it does # + +This library lets you query all possible RDF data sources, aggregate them and query them. The query syntax is friendly and easy to aid you in exploring what the semantic web can do. + + + +## Illustrative examples ## +Loading RDF over HTTP: + +``` +>>> import rdfgraph +>>> g = rdfgraph.Graph() +>>> g.load("http://webscience.org/people") +<__main__.Graph object at 0x017F3A70> +>>> g.all_of_type('foaf:Person').sort('foaf:family_name').get('foaf:name').join(", ") +"Harold (Hal) Abelson, Hans Akkermans, Harith Alani, Tim Berners-Lee, Michael L. Brodie, Leslie Carr, Manuel Castells, Samantha Collins, Noshir Contractor, Richard Cyganiak, Susan Davies, David De Roure, Stefan Decker, Craig Gallen, Hugh Glaser, Jennifer Golbeck, Christopher Gutteridge, Wendy Hall, James Hendler, Lalana Kagal, Joyce Lewis, Helen Margetts, Deborah L. McGuinness, Peter Monge, Sudarshan Murthy, Nichola Need, Kieron O'Hara, Nigel Shadbolt, Steffen Staab, John Taylor, Brian Uzzi, Mark Weal, Daniel Weitzner, Bebo White, Jianping Wu, mc schraefel, Amy van der Hiel" +>>> +``` +This illustrates how the `ResourceList` object (returned by `all_of_type`) helps you manipulate sets of data easily. + +Loading from SPARQL endpoints +``` +>>> graph = Graph() +>>> graph.add_endpoint("http://linked4.org/lsd/sparql") +>>> rbwm = 'http://www.rbwm.gov.uk/id/authority/rbwm#id' +>>> print graph[rbwm]['rdfs:label'] +Royal Borough of Windsor and Maidenhead +``` +Note that you query in exactly the same way and don't have to write SPARQL. Just don't worry about how the querying happens :-) + +# Graph class # +This class is your gateway into the semantic web. It's got a Jena backed data model to store data locally for processing and it can also maintain a list of remote data sources to progressively load data into the local graph as you query. + +## Returned lists ## +Most returned lists are actually `ResourceList`s with lots of extra handy methods. + +## Passing parameters ## +Most methods accept single items, parameter lists, tuples and `ResourceList`s - just try passing in whatever you have and it'll normally work. + +## Getting data in ## + +### add\_endpoint(uri) ### +Register a SPARQL endpoint for future automatic queries. + +### import\_uri(uri) ### +Takes a single URI and loads from it directly into the graph, bypassing the web cache. It's rare that you'll want to do that... + +### load(``) ### +Takes some URIs and loads RDF from each of them into the graph. + +### String data loading functions ### +You can load RDF data from strings using the following functions: + * load\_n3 + * load\_ntriple + * load\_rdfxml + * load\_turtle + +### load\_sparql ### +Execute a SPARQL CONSTRUCT statement and incorporate the results into this graph. + + +### read\_sparql, sparql(query) ### +Execute a SPARQL SELECT and return a `SparqlList` iterator over the results. + +This does NOT import triples into the local graph, as no triples are returned by a Sparql select. + + +## Query ## + +### resource, get(uri) ### +Given a uri returns a Resource (`URIResource`) object that has a world of handy convenience methods for traversing, querying and updating your graph. + +### has\_triple(subject, predicate, object) ### +Returns True if the given triple is present in the graph. + +### triples(subject, predicate, object) ### +The main workhorse method. Returns iterators of triples that match the given pattern, where 'None' represents a wildcard. + +### all\_of\_type(type) ### +A handy method for selecting resources based on their `rdf:type` property. + +### all\_types() ### +Returns a list of all distinct `rdf:type`s. + +## Other ## +### dump() ### +Render as HTML. + +### expand\_uri(uri) ### +Convert a short form URI into a full one. + +### add\_namespaces, add\_ns(prefix, uri) ### +Add a prefix and URI to the graph's record of shortform URI prefixes. + +### prefixes/namespaces() ### +Returns a dictionary of all URI namespace prefixes. + +### set\_triple(subject, predicate, object) ### +Usually this would be done through Resource objects (see `get(uri)`) but if you need it, then it's here. + +### shrink\_uri(uri) ### +Convert a full URI into a prefixed one, if possible. + +### to\_string(format='turtle') ### +Return an RDF rendering of this graph. + + + +# Resource functions # +## Properties ## +### all(property) ### +Iterate over all the values of the given property. + +### get(property) ### +Return the (arbitrarily first) value of this property. Useful when you know there's only one value. + +### has(property) ### +Test if the given property exists. + +### set(property, value) ### +Update the graph, adding a triple with this resource as the subject and the two given resources as the predicate and the object. + +### properties() ### +Iterate over the distinct properties of this resource. + +### property\_values() ### +Iterate over the properties of this resource and their values. + +### inverse\_properties() ### +Iterate over all distinct inverse properties (where this resource is the object). + +### inverse\_property\_values() ### +Iterate over all inverse properties and their values. + +## Other ## +### dump() ### +Render as HTML. + +### load() ### +Load RDF data from this resource's URI. + +### load\_same\_as() ### +Look up the URIs of everything that is 'owl:sameAs' this resource and load RDF from them. + +Also available through item assignment: +``` +resource[property] = value +``` + +### short\_html() ### +Return a short HTML description of this resource, not including any arbitrary properties. + +### shrink\_uri() ### +Return the short form of this resource's URI. + +### to\_string ### +Export this resource as RDF. + +### type() ### +Look up the 'rdf:type' of this resource. + + + + +# ResourceList functions # + +### get(property) ### +For each resource, get this property, returning a further list. + +### sort(property) ### +Sort the list by this property. Involves loading the whole list - beware. + +### all(property) ### +For each resource, get all the values of this property, concatenation those lists. + +### join(separator) ### +Call to\_string() on each resource then join them with this separator. + +### union(other) ### +Returns the union of this resource with another. + +### intersection(other) ### +Returns the intersection of this resource and another. \ No newline at end of file diff --git a/EarlyApiRedesign.md b/EarlyApiRedesign.md new file mode 100644 index 0000000..09019cc --- /dev/null +++ b/EarlyApiRedesign.md @@ -0,0 +1,110 @@ +# Classes # + +These are the major concepts dealt with. + +## Graph ## + +An independent object that represents a local RDF graph maintained in Jena. + +(Some of these method renames have not been implemented yet.) + +Methods: + * Input + * `read_text` - guess content type + * `read_turtle` + * `read_n3` + * `read_ntriples` + * `read_rdfxml` + * `read_uri` - load data from URI + * `read_file` - save to a file + * Output + * `save_text` - content type param + * `save_turtle` + * `save_n3` + * `save_ntriples` + * `save_rdfxml` + * `save_file` - save to a file + * Query + * ... all the signature bits but no sparql + * `resource`/`get`/`graph[uri]` - get a URI Resource + * `literal` - get a Literal Resource + * `sparql(query)` - Run a sparql select over this graph. + * `triples(x, y, z)` - Select statements from the graph. + * Update + * `add_triple(x, y, z)` + * `remove_triple(x, y, z)` + +## Endpoint ## +Represents a standalone endpoint and handles caching, querying, etc. + +**More work is required here. The class is still largely bare.** + + * select + * construct + * describe + * ask + + * Pending structure changes: + * This class should: + * handle automatically fetching triples when requested. + * be responsible for maintaining a disk based graph that caches those triples. + * be responsible for maintaining efficiency of access to this endpoint. + * (all of these responsibilities currently rest with a Dataset) + + +## Dataset ## +Represents a number of data sources, both sparql endpoints and local graphs. + +Method groups: + * Endpoints: + * add/remove/list + * Graphs: + * Add/remove/list + * Query - provide a combined query system returning iterator based data in suitable wrapper classes + * Sparql + * Triple query + * Native python query + + +## Resource ## +Represents a Node (literal, uri or blank) and provides handy query methods. _This is the main workhorse_. + +Methods: + * Data + * Get literal as native datatype + * get URI + * `__nonzero__` method for blank nodes (`if node: node['some:thing']`) + * Traversal + * `get(property)` - get the Resource linked to by this property. + * `all(property)` - get all Resources linked to by this property. + * `has(property)` - check if the Resource has this property. + * Update: + * `node['foaf:nick'] = 'Binky'` or `node.set('foaf:nick', 'Binky')` to replace an existing relation. + * `node.add('foaf:nick', 'Binky')` to replace an existing relation. + * Interrogate - list properties, etc. + * `properties()` - Iterate over all properties of this resource. + * `property_values()` - Iterate over all properties of this resource and their related values. + * Utility + * `shrink_uri()`/`expand_uri()` + * `is_uri()`/`is_literal()`/`is_blank()` + * `uri()` - return the URI of this resource. + * `value()` - return the literal value of this resource. + * `type()` - return the 'rdf:type' of this resource. + * `load_same_as()` - load all resources that are 'owl:sameAs' this one. + * `load()` - Load RDF data form this Resource's URI. + * `get_ns()` - return the namespace URI form this Resource + * `in_ns(ns)` - check if the resource is in this namespace. + +## ResourceList ## +Represents an iterator based list of resources. + +Method groups: + * Set functions - combine/intersect `ResourceList`s + * Query - Handy Resource functions mapped across all list items + +## Internal ## + * Node - raw engine (Jena) output in Python datatypes. + * Resource - A URI node. + * Blank - just hold the Jena ID object + * Literal - parsed and tagged + * Heavy testing on the unicode I/O to Jena through JPype. \ No newline at end of file diff --git a/Installation.md b/Installation.md new file mode 100644 index 0000000..73424ac --- /dev/null +++ b/Installation.md @@ -0,0 +1,32 @@ +# Prerequisites # + +This library depends on Python 2.6 or greater and [JPype](http://sourceforge.net/projects/jpype/). + +# Installing JPype # + +## Windows ## + * [Download the installer](http://sourceforge.net/projects/jpype/files/JPype/0.5.4/) that matches your Python version. + * Run it and follow the instructions. + +## Linux/Mac ## + * [Download the ZIP](http://sourceforge.net/projects/jpype/files/JPype/0.5.4/) package of JPype. + * Extract the contents somewhere and then open a command prompt there. + * As root, run `python setup.py install`. + +# Installing python-graphite # + + * First download the latest ZIP package in the [downloads area](http://code.google.com/p/python-graphite/downloads/list). + * Extract the contents somewhere and then open a command prompt there. + * As root/administrator, run `python setup.py install`. + + +## Testing the installation ## +Run this: +``` +~# python +>>> import graphite +>>> graph = graphite.Graph() +>>> +``` + +If you don't get a traceback, it worked! If it fails, do let me know what the error message is and I'll do my best to help. \ No newline at end of file diff --git a/ProjectHome.md b/ProjectHome.md new file mode 100644 index 0000000..ee93d65 --- /dev/null +++ b/ProjectHome.md @@ -0,0 +1,97 @@ +## Overview ## +A Python spin-off of Chris Gutteridge's Graphite library - http://graphite.ecs.soton.ac.uk/ + +The intent is to facilitate gathering exactly the data you want from wherever it happens to be and make it as easy to interrogate as possible. To this end there is a single class that contacts SPARQL endpoints and holds any other data you have, and lets you query it using rich wrapper classes - `graph.all_of_type('foaf:Person').sort('foaf:family_name').get('foaf:name')` + +Most features are in place and working, but the project is still new so it may not always as efficient as it could be. More updates to follow soon. + +In the long run it will be able to tap in to Jena's powerful inference and query capabilities to provide a tailored flexible tool for exploring what's possible with the semantic web. + +### Contact ### +If you find a bug or it just fails please do let me know and I'll fix it. + +Comments gratefully recieved at **python-graphite@rklyne.net** + +I'm also on twitter as [@ronanklyne](https://twitter.com/ronanklyne) + +## Getting started ## +### Installation ### +There is an easy [installation guide on the wiki](http://code.google.com/p/python-graphite/wiki/Installation). + +### And off you go... ### + +Once you're running you can do things like this: +``` +>>> g = Graph() +>>> g.load("http://webscience.org/people") +<__main__.Graph object at 0x017F3A70> +>>> g.all_of_type('foaf:Person').sort('foaf:family_name').get('foaf:name').join(", ") +"Harold (Hal) Abelson, Hans Akkermans, Harith Alani, Tim Berners-Lee, Michael L. Brodie, Leslie Carr, Manuel Castells, Samantha Collins, Noshir Contractor, Richard Cyganiak, Susan Davies, David De Roure, Stefan Decker, Craig Gallen, Hugh Glaser, Jennifer Golbeck, Christopher Gutteridge, Wendy Hall, James Hendler, Lalana Kagal, Joyce Lewis, Helen Margetts, Deborah L. McGuinness, Peter Monge, Sudarshan Murthy, Nichola Need, Kieron O'Hara, Nigel Shadbolt, Steffen Staab, John Taylor, Brian Uzzi, Mark Weal, Daniel Weitzner, Bebo White, Jianping Wu, mc schraefel, Amy van der Hiel" +>>> +``` + +**You can do this with SPARQL too!** +``` +>>> graph = Graph() +>>> graph.add_endpoint("http://linked4.org/lsd/sparql") +>>> rbwm = 'http://www.rbwm.gov.uk/id/authority/rbwm#id' +>>> print graph[rbwm]['rdfs:label'] +Royal Borough of Windsor and Maidenhead +``` + +Easy, huh? + +There is a wiki page with [full documentation](http://code.google.com/p/python-graphite/wiki/Documentation) but I'd recommend just playing around for a bit. + +(Hopefully it will just start up and work, but if not there's a section of JPype/Java path related config in `config.ini` that might need tinkering with) + +## Multiple data sources ## + +All of this works just the same when you add multiple data sources! + +Linked data is all about the linking. Interrogating multiple disparate sources at once is one of the main reasons I put this tool together. Try connecting up data from everyone who will provide it and see what you get :-) + +This feature is still under development (the project is six days old) and won't be very clever or fast about using 10 or more SPARQL endpoints together, but it should work. + +## Features ## + +Done and working: + * Jena backed data model in Python + * Handy pythonic query syntax (see [run.py](http://code.google.com/p/python-graphite/source/browse/examples/run.py) for 'examples') + * Add new triples: "graph.get('person:1').set('foaf:nick', 'Binky') # Add a nickname" + * Run SPARQL queries over data in memory + * Run SPARQL selects against remote endpoints. + * Import into local graphs with SPARQL CONSTRUCT statements. + * Config niceties done. + * HTML output + * ResourceList set functions + * RDF output in something other than Turtle (maybe) + * Automatically import data from SPARQL endpoints as you query. (It's primitive but it works!) + * Read in from: + * HTTP + * String + * TTL + * N3 + * RDF/XML + * File URI + +## Futures ## +Some things that need doing: + * More and better documentation. + * Read in from RDFa + * Delay SPARQL queries in some kind of continuation object. + * This would make using SPARQL endpoints much more efficient. + * Optimise + +Some ideas of where to go next: + * 'Live graphs' - Try to remove the dependency on big local datastores and increase our facility for bringing data in and forgetting it when we're done, as we do with the web today. + * 'Magic SPARQLs' - given a list of endpoints, work out what they can each do and query the appropriate endpoints without being explicitly asked. + +## Dependencies ## +**Requires [Python 2.6+](http://python.org/) and [JPype](http://sourceforge.net/projects/jpype/)**. You should go get these. + +[Jena](http://jena.sourceforge.net/index.html) 2.6.4 has been included, but you can use your own copy quite easily - see `config.ini`. + +JPype and Jena are both wonderful libraries without which I could not have built this tool. + +I've tested this successfully on Linux and Windows 7 with Sun JVMs, but other systems ought to work.