Generate and load ElasticSearch indexes based on JSON Table Schema descriptors.
pip install tableschema-elasticsearch
Package implements Tabular Storage interface.
elasticsearch
is used as the db wrapper. We can get storage this way:
from elasticsearch import Elasticsearch
from jsontableschema_sql import Storage
engine = Elasticsearch()
storage = Storage(engine)
Then we could interact with storage ('buckets' are ElasticSearch indexes in this context):
storage.buckets # iterator over bucket names
storage.create('bucket', [(doc_type, descriptor)],
reindex=False, mapping_generator_cls=None)
# Reindex will copy existing documents from an existing index with the same name (not implemented yet)
# mapping_generator_cls allows customization of the generated mapping
storage.delete('bucket')
storage.describe('bucket') # return descriptor, not implemented yet
storage.iter('bucket', doc_type=optional) # yield rows
storage.read('bucket', doc_type=optional) # return rows
storage.write('bucket', doc_type, rows, primary_key,
as_generator=False)
# primary_key is a list of field names which will be used to generate document ids
When creating indexes, we always create an index with a semi-random name and a matching alias that points to it. This allows us to decide whether to re-index documents whenever we're re-creating an index, or to discard the existing records.
When creating indexes, the tableschema types are converted to ES types and a mapping is generated for the index.
Some special properties in the schema provide extra information for generating the mapping:
array
types need also to have thees:itemType
property which specifies the inner data type of array items.object
types need also to have thees:schema
property which provides a tableschema for the inner document contained in that object (or havees:enabled=false
to disable indexing of that field).
Example:
{
"fields": [
{
"name": "my-number",
"type": "number"
},
{
"name": "my-array-of-dates",
"type": "array",
"es:itemType": "date"
},
{
"name": "my-person-object",
"type": "object",
"es:schema": {
"fields": [
{"name": "name", "type": "string"},
{"name": "surname", "type": "string"},
{"name": "age", "type": "integer"},
{"name": "date-of-birth", "type": "date", "format": "%Y-%m-%d"}
]
}
},
{
"name": "my-library",
"type": "array",
"es:itemType": "object",
"es:schema": {
"fields": [
{"name": "title", "type": "string"},
{"name": "isbn", "type": "string"},
{"name": "num-of-pages", "type": "integer"}
]
}
},
{
"name": "my-user-provded-object",
"type": "object",
"es:enabled": false
}
]
}
By providing a custom mapping generator class (via mapping_generator_cls
), inheriting from the MappingGenerator class you should be able
elasticsearch-py
is used to access the ElasticSearch interface - docs.
https://github.com/frictionlessdata/tableschema-elasticsearch-py#snapshot
Please read the contribution guideline:
Thanks!