The Wayback Machine - https://web.archive.org/web/20180611015655/https://github.com/couchbase/query
Skip to content
Query engine.
Go JavaScript Yacc Shell CSS HTML
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
accounting MB-29906 add auditing statistics May 31, 2018
algebra MB-29977 Correctly return correlated indicator from select term Jun 6, 2018
audit MB-29906 add auditing statistics May 31, 2018
auth MB-24315 use short aliases for query roles Jun 2, 2017
clustering MB-26665 Support IPv6 for query, cbq and godbc Nov 15, 2017
data/sampledb Rename order to orders in data and modify customerId to reflect custo… Apr 10, 2015
datastore MB26265 improve fetch performance through quicker key retrieval May 25, 2018
dist CBD-1480: Rename 'query' go packages Feb 18, 2015
distributed MB-27355 distributed prepare statement upon prepare Jan 11, 2018
docs MB-22750. Add CONTAINS_TOKEN_LIKE() and CONTAINS_TOKEN_REGEXP(). Feb 10, 2017
errors MB-29711 Join hint should not be allowed in MERGE statement with ON K… Jun 1, 2018
etc MB-28802 switch from source/user to domain/user in audit records Apr 10, 2018
execution MB-29917 Move hash table code from execution package to util package Jun 1, 2018
expression MB-29529: Add coalesce, NVL, NVL2 functions May 24, 2018
logging MB-22150 add fatal to logging Apr 20, 2017
parser/n1ql MB-29711 Support ANSI MERGE May 29, 2018
plan MB-29845 Properly handled newly created "correlated" indicator in mar… May 30, 2018
planner MB-29712 Support ANSI OUTER JOIN to ANSI INNER JOIN transformation May 30, 2018
prepareds MB-29709 Adding semantics layer to N1QL May 24, 2018
resources Add Download button that links to Couchbase downloads to the README. Sep 21, 2016
semantics MB-29711 Join hint should not be allowed in MERGE statement with ON K… Jun 1, 2018
server MB-29906 add auditing statistics May 31, 2018
shell MB-19438 Support X509 for shell Apr 21, 2018
sort MB-25900 MB-25901 avoid panics on order bys Sep 1, 2017
static Update prose and endpoints to DP4. Jan 10, 2015
test MB-29711 Join hint should not be allowed in MERGE statement with ON K… Jun 1, 2018
timestamp MB-17450 Verify that requests using the scan_vector parameter scan at… Feb 2, 2016
tutorial Fix MB-14179. Replace smart quotes with single quotes to avoid syntax… Jul 11, 2015
util MB-29917 Move hash table code from execution package to util package Jun 1, 2018
value MB26265 improve fetch performance through quicker key retrieval May 25, 2018
.gitignore Add IS [ NOT ] KNOWN as a synonym for IS [ NOT ] VALUED. Mar 26, 2016
CMakeLists.txt Revert "Upgrade query to use go 1.10 for socket loop issues" Mar 23, 2018
README.md Update the build instructions to call out goyacc installation to make… Dec 1, 2017
build.sh Add go fmt ./... to build script. Apr 12, 2016
build_tags.sh First pass implementation of couchbase datastore Aug 26, 2014

README.md

Introduction

This README describes the source code and implementation of the N1QL query engine and components.

Goals

The goals of this implementation are:

  • Language completeness

  • GA code base

  • Source code aesthetics

    • Design, object orientation
    • Data structures, algorithms
    • Modularity, readability

Features

This N1QL implementation provides the following features:

  • Read

    • SELECT
    • EXPLAIN
  • DDL

    • CREATE / DROP INDEX
    • CREATE PRIMARY INDEX
  • DML

    • UPDATE
    • DELETE
    • INSERT
    • UPSERT
    • MERGE

    The ACID semantics of the DML statements have not yet been decided or implemented. Nor has the underlying support in Couchbase Server. At this time, only the DML syntax and query engine processing have been provided.

Deployment architecture

The query engine is a multi-threaded server that runs on a single node. When deployed on a cluster, multiple instances are deployed on separate nodes. This is only for load-balancing and availability. In particular, the query engine does not perform distributed query processing, and separate instances do not communicate or interact.

In production, users will have the option of colocating query engines on KV and index nodes, or deploying query engines on dedicated query nodes. Because the query engine is highly data-parallel, we have a goal of achieving good speedup on dedicated query nodes with high numbers of cores.

The remainder of this document refers to a single instance of the query engine. At this time, load balancing, availability, and liveness are external concerns that will be handled later by complementary components.

Processing sequence

  • Parse: Text to algebra. In future, we could also add JSON to algebra (e.g. if we add something like JSONiq or the Mongo query API).

  • Prepare: Algebra to plan. This includes index selection.

  • Execute: Plan to results. When we add prepared statements, this phase can be invoked directly on a prepared statement.

Packages

Value

The value package implements JSON and non-JSON values, including delayed parsing. This implementation has measured a 2.5x speedup over dparval.

Primitive JSON values (boolean, number, string, null) are implemented as golang primitives and incur no memory or garbage-collection overhead.

This package also provides collation, sorting, and sets (de-duplication) over Values.

  • Value: Base interface.

  • AnnotatedValue: Can carry attachments and metadata.

  • CorrelatedValue: Refers and escalates to a parent Value. Used to implement subqueries and name scoping.

  • ParsedValue: Delayed evaluation of parsed values, including non-JSON values.

  • MissingValue: Explicit representation of MISSING values. These are useful for internal processing, and can be skipped during final projection of results.

  • BooleanValue, NumberValue, StringValue, NullValue, ArrayValue, ObjectValue: JSON values.

Errors

The errors package provides a dictionary of error codes and messages. When fully implemented, the error codes will mirror SQL, and the error messages will be localizable.

All user-visible errors and warnings should come from this package.

Expression

The expression package defines the interfaces for all expressions, and provides the implementation of scalar expressions.

This package is usable by both query and indexing (for computed indexes).

Expressions are evaluated within a context; this package provides a default context that can be used by indexing. The context includes a statement-level timestamp.

Expressions also provide support for query planning and processing; this includes equivalence testing, constant folding, etc.

The following types of scalar expressions are included:

  • arithmetic operators
  • CASE
  • Collection expressions (ANY / EVERY / ARRAY / FIRST)
  • Comparison operators (including IS operators)
  • String concat
  • Constants (including literals)
  • Functions
  • Identifiers
  • Navigation (fields, array indexing, array slicing)

Algebra

The algebra package defines the full algebra and AST (abstract syntax tree) for all N1QL statements (using the expression package for scalar expressions).

It includes aggregate functions, subquery expressions, parameter expressions, bucket references, and all the N1QL statements and clauses.

Aggregate functions

  • ARRAY_AGG(expr)

  • ARRAY_AGG(DISTINCT expr)

  • AVG(expr)

  • AVG(DISTINCT expr)

  • COUNT(*)

  • COUNT(expr)

  • COUNT(DISTINCT expr)

  • MAX(expr)

  • MIN(expr)

  • SUM(expr)

  • SUM(DISTINCT expr)

Plan

The plan package implements executable representations of queries. This includes both SELECTs and DML statements.

When we implement prepared statements, they will be represented as plans and stored as JSON documents or in-memory plan objects.

Plans are built from algebras using a visitor pattern. A separate planner / optimizer will be implemented for index selection.

Plans include the following operators:

  • Scans

    • PrimaryScan: Scans a primary index.

    • IndexScan: Scans a secondary index.

    • KeyScan: Does not perform a scan. Directly treats the provided keys as a scan.

    • ParentScan: Used for UNNEST. Treats the parent object as the result of a scan.

    • ValueScan: Used for the VALUES clause of INSERT and UPSERT statements. Treats the provided values as the result of a scan.

    • DummyScan: Used for SELECTs with no FROM clause. Provides a single empty object as the result of a scan.

    • CountScan: Used for SELECT COUNT(*) FROM bucket-name. Treats the bucket size as the result of a scan, without actually performing a full scan of the bucket.

    • IntersectScan: A container that scans its child scanners and intersects the results. Used for scanning multiple secondary indexes concurrently for a single query.

  • Fetch

  • Joins

    • Join

    • Nest

    • Unnest

  • Filter

  • Group: To enable data-parallelism, grouping is divided into three phases. The first two phases can each be executed in a data-parallel fashion, and the final phase merges the results.

    • InitialGroup: Initial phase.

    • IntermediateGroup: Cumulate intermediate results. This phase can be chained.

    • FinalGroup: Compute final aggregate results.

  • Other SELECT operators

    • Project

    • Distinct

    • Order

    • Offset

    • Limit

    • Let

    • UnionAll: Combine the results of two queries. For UNION, we perform UNION ALL followed by DISTINCT.

  • Framework operators

    • Collect: Collect results into an array. Used for subqueries.

    • Discard: Discard results.

    • Stream: Stream results out. Used for returning results.

    • Parallel: A container that executes multiple copies of its child operator in parallel. Used for all data-parallelism.

    • Sequence: A container that chains its children into a sequence. Used for all execution pipelining.

  • DML operators

    • SendDelete

    • SendInsert

    • Set: Used for UPDATE.

    • Unset: Used for UPDATE.

    • Clone: Used for UPDATE. Clones data values so that UPDATEs read original values and mutate a clone.

    • SendUpdate

    • Merge

Execution

The execution package implements query execution. The objects in this package mirror those in the plan package, except that these are the running instances.

Golang channels are used extensively to implement concurrency and signaling.

Subquery execution

The Context object supports subquery execution. It performs planning, execution, and collection of subquery results. It also performs plan and result caching for uncorrelated subqueries.

Datastore

The datastore package defines the interface to the underlying database server.

Some key differences from the previous datastore API (previously catalog API):

  • DML support

  • Use of channels for error handling and stop signaling

  • Generalized index interface that supports any combination of hash and range indexing

Parser

This package will contain the parser and lexer.

Server

This package will contain the main engine executable and listener.

Clustering

This package defines the interface to the underlying cluster management system.

It provides a common abstraction for cluster management, including configuration of and the lifecycle of a cluster.

Accounting

This package will contain the interface to workload tracking and monitoring. Accounting data can cover metrics, statistics, event and potentially log data.

It provides a common abstraction for recording accounting data and services over accounting data.

Shell

This package will contain the client command-line shell.

Sort

This package provides a parallel sort. It was copied from the Golang source and basic parallelism was added, but it has not been fine-tuned.

cbq

This package provides a client library that will be used by the command-line shell to encapsulate cluster-awareness and other connectivity concerns.

The library will implement the standard golang database APIs at database/sql and database/sql/driver.

The library will connect using the Query REST API and the Query Clustering API.

Data parallelism

The query engine is designed to be highly data-parallel. By data-parallel, we mean that individual stages of the execution pipeline are parallelized over their input data. This is in addition to the parallelism achieved by giving each stage its own goroutine.

Below, N1QL statement execution pipelines are listed, along with the data-parallelization and serialization points.

SELECT

  1. Scan
  2. Parallelize
  3. Fetch
  4. Join / Nest / Unnest
  5. Let (Common subexpressions)
  6. Where (Filter)
  7. GroupBy: Initial
  8. GroupBy: Intermediate
  9. Serialize
  10. GroupBy: Final
  11. Parallelize
  12. Letting (common aggregate subexpressions)
  13. Having (aggregate filtering)
  14. Serialize
  15. Order By (Sort)
  16. Parallelize
  17. Select (Projection)
  18. Serialize
  19. Distinct (De-duplication)
  20. Offset (Skipping)
  21. Limit

INSERT

  1. Scan
  2. Parallelize
  3. SendInsert
  4. Returning (Projection)

DELETE

  1. Scan
  2. Parallelize
  3. Fetch
  4. Let (Common subexpressions)
  5. Where (Filter)
  6. Serialize
  7. Limit
  8. Parallelize
  9. SendDelete
  10. Returning (Projection)

UPDATE

  1. Scan
  2. Parallelize
  3. Fetch
  4. Let (Common subexpressions)
  5. Where (Filter)
  6. Serialize
  7. Limit
  8. Parallelize
  9. Clone
  10. Set / Unset
  11. SendUpdate
  12. Returning (Projection)

Steps to create a build

Get a working repository

 $ export GOPATH=$HOME/query/
 $ mkdir -p $GOPATH/src/github.com/couchbase/
 $ cd ~/query
 $ mkdir bin pkg

Install the required goyacc tool and update the PATH to see it:

 $ cd $GOPATH/src/golang.org/x
 $ git clone https://github.com/golang/tools.git
 $ cd tools/cmd/goyacc
 $ go build
 $ go install
 $ export PATH=$PATH:$GOPATH/bin/

Clone the query repo and build it:

 $ cd $GOPATH/src/github.com/couchbase/
 $ git clone https://github.com/couchbase/query query
 $ cd query 
 $ ./build.sh

By default, this builds the community edition of query. If you want the enterprise version (which includes schema inferencing), use:

 $ ./build.sh -tags "enterprise"

All the builds exist in their respective directories. You can find the cbq and cbq-engine binaries in the shell and server directories.

Creating a local build using local json files:

Pre-requisites:

cbq-engine binary cbq binary Data sample set zip file(sample set of json documents)

Steps to run:

  1. Create a directory

    $ mkdir ~/sample_build/tutorial/data
    
  2. Copy the binaries cbq and cbq-engine into the ~/sample_build/. directory.

  3. Copy the data sample into the ~/sample_build/tutorial/data/. directory

  4. Unzip the sample using the command

    $ unzip sampledb.zip
    
  5. Go back to the directory containing the binaries

    $ cd ~/sample_build/
    
  6. First run the cbq-engine executable using the –datastore “” -namespace <name of subdirectory the data is in. ( here the ampersand can be used to run the process in the background and get the prompt back) :

    $ ./cbq-engine -datastore "$HOME/sample_build/tutorial" -namespace data 
    
  7. Then run the cbq executable in a new terminal. This should give you the N1QL command line interface shell.

    $ ./cbq
    cbq> select * from tutorial;
    
  8. TIME TO EXPERIMENT ☺

Using the Admin UI

  1. Download the Couchbase server and install it (for the mac add it to the Applications folder)

  2. Open up localhost:8091 and follow setup instructions

  3. Create your own buckets and fill in data.

  4. Connect N1QL with the Couchbase server we need to run the following command in two terminals one after the other.

    $ ./cbq-engine –datastore “http://127.0.0.1:8091/” 
    $ ./cbq -u=<username> -p=<password> localhost:8091
    
  5. Run the following command on the created buckets before querying them

    cbq> create primary index on [bucket_name]  
    
  6. Run N1QL queries on the CLI.

NOTE: Ctrl + D should allow you to exit the running cbq and cbq-engine processes.

You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.
Morty Proxy This is a proxified and sanitized view of the page, visit original site.