Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Query-farm/marisa

Open more actions menu

Repository files navigation

DuckDB Marisa Extension by Query.Farm

The Marisa extension, developed by Query.Farm, adds MARISA (Matching Algorithm with Recursively Implemented StorAge) trie functionality for DuckDB. MARISA is a static and space-efficient trie data structure that enables fast string lookups, prefix searches, and predictive text operations.

Use Cases

MARISA tries are particularly useful for:

  • Autocomplete/Type-ahead functionality: Use marisa_predictive() to find all completions for a partial string
  • Spell checking: Use marisa_lookup() to verify if words exist in a dictionary
  • URL routing: Efficiently match URL patterns and extract parameters
  • IP address prefix matching: Network routing and firewall rules
  • String deduplication: Compact storage of large sets of strings with common prefixes

Installation

marisa is a DuckDB Community Extension.

You can now use this by using this SQL:

install marisa from community;
load marisa;

The marisa extension provides several functions for working with MARISA tries:

Creating a Trie

Use the marisa_trie() aggregate function to create a trie from string data:

CREATE TABLE employees(name TEXT);
INSERT INTO employees VALUES('Alice'), ('Bob'), ('Charlie'), ('David'), ('Eve'), ('Frank'), ('Mallory'), ('Megan'), ('Oscar'), ('Melissa');

-- Create a trie from the employee names
CREATE TABLE employees_trie AS SELECT marisa_trie(name) AS trie FROM employees;

SELECT trie, octet_length(trie) FROM employees_trie;
┌─────────────────────────────────────────────────────────────────┬────────────────────┐
│                              trie                               │ octet_length(trie) │
│                              blob                               │       int64        │
├─────────────────────────────────────────────────────────────────┼────────────────────┤
│ We love Marisa.\x00\x08\x00\x00\x00\x00\x00\x00\x00\xFD\x1B`\…  │        4160        │
└─────────────────────────────────────────────────────────────────┴────────────────────┘

Lookup Function

Check if a string exists in the trie using marisa_lookup():

-- Check if 'Alice' exists in the trie (returns true)
SELECT marisa_lookup(trie, 'Alice') FROM employees_trie;
┌──────────────────────────────┐
│ marisa_lookup(trie, 'Alice') │
│           boolean            │
├──────────────────────────────┤
│ true                         │
└──────────────────────────────┘

-- Check if 'Unknown' exists in the trie (returns false)
SELECT marisa_lookup(trie, 'Unknown') FROM employees_trie;
┌────────────────────────────────┐
│ marisa_lookup(trie, 'Unknown') │
│            boolean             │
├────────────────────────────────┤
│ false                          │
└────────────────────────────────┘

Common Prefix Search

Find all strings in the trie that are prefixes of a given string using marisa_common_prefix():

CREATE TABLE countries(name TEXT);
INSERT INTO countries VALUES ('U'), ('US'), ('USA');
CREATE TABLE countries_trie AS SELECT marisa_trie(name) AS trie FROM countries;

-- Find all prefixes of 'USA' (returns ['U', 'US', 'USA'])
SELECT marisa_common_prefix(trie, 'USA', 10) FROM countries_trie;
┌───────────────────────────────────────┐
│ marisa_common_prefix(trie, 'USA', 10) │
│               varchar[]               │
├───────────────────────────────────────┤
│ [U, US, USA]                          │
└───────────────────────────────────────┘

Predictive Search

Find all strings in the trie that start with a given prefix using marisa_predictive():

-- Find all names starting with 'Me' (returns ['Megan', 'Melissa'])
SELECT marisa_predictive(trie, 'Me', 10) FROM employees_trie;
┌───────────────────────────────────┐
│ marisa_predictive(trie, 'Me', 10) │
│             varchar[]             │
├───────────────────────────────────┤
│ [Megan, Melissa]                  │
└───────────────────────────────────┘

Function Reference

marisa_trie(column)

Type: Aggregate Function Description: Creates a MARISA trie from string values in a column. Parameters:

  • column (VARCHAR): Column containing strings to build the trie from

Example:

SELECT marisa_trie(name) FROM employees;

marisa_lookup(trie, search_string)

Type: Scalar Function Description: Checks if a string exists in the trie. Parameters:

  • trie (BLOB): The trie created by marisa_trie()
  • search_string (VARCHAR): String to search for

Returns: BOOLEAN (true if found, false otherwise)

Example:

SELECT marisa_lookup(trie, 'Alice') FROM employees_trie;

marisa_common_prefix(trie, search_string, max_results)

Type: Scalar Function Description: Finds all strings in the trie that are prefixes of the search string. Parameters:

  • trie (BLOB): The trie created by marisa_trie()
  • search_string (VARCHAR): String to find prefixes for
  • max_results (INTEGER): Maximum number of results to return

Returns: LIST(VARCHAR) - List of prefix matches

Example:

SELECT marisa_common_prefix(trie, 'USA', 10) FROM countries_trie;
-- Returns: ['U', 'US', 'USA']

marisa_predictive(trie, prefix, max_results)

Type: Scalar Function Description: Finds all strings in the trie that start with the given prefix. Parameters:

  • trie (BLOB): The trie created by marisa_trie()
  • prefix (VARCHAR): Prefix to search for
  • max_results (INTEGER): Maximum number of results to return

Returns: LIST(VARCHAR) - List of strings starting with the prefix

Example:

SELECT marisa_predictive(trie, 'Me', 10) FROM employees_trie;
-- Returns: ['Megan', 'Melissa']

About

The Marisa extension by Query.Farm integrates the fast, space-efficient MARISA trie into DuckDB, enabling high-performance string lookups, prefix searches, and autocomplete functionality.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Generated from duckdb/extension-template
Morty Proxy This is a proxified and sanitized view of the page, visit original site.