Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

richardstartin/splitmap

Open more actions menu

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

96 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

splitmap

Build Status Coverage Status

This library builds on top of RoaringBitmap to provide a parallel implementation of boolean circuits (multidimensional filters) and arbitrary aggregations over filters.

For instance, to compute a sum product on a dataset filtered such that only one of two conditions holds:

    PrefixIndex<ChunkedDoubleArray> quantities = ...
    PrefixIndex<ChunkedDoubleArray> prices = ...
    SplitMap februarySalesIndex = ...
    SplitMap luxuryProductsIndex = ...
    QueryContext<String, PriceQty> context = new QueryContext<>(
    Map.ofEntries(entry("luxuryProducts", luxuryProductsIndex), entry("febSales", februarySalesIndex), 
    Map.ofEntries(entry(PRICE, prices), entry(QTY, quantities)))); 

    double februaryRevenueFromLuxuryProducts = 
            Circuits.evaluateIfKeysIntersect(context, slice -> slice.get("febSales").and(slice.get("luxuryProducts")), "febSales", "luxuryProducts")
            .stream()
            .parallel()
            .mapToDouble(partition -> partition.reduceDouble(SumProduct.<PriceQty>reducer(price, quantities)))
            .sum();

Which, over millions of quantities and prices, can be computed in under 200 microseconds on a modern processor, where parallel streams may take upwards of 20ms.

It is easy to write arbitrary routines combining filtering, calculation and aggregation. For example statistical calculations evaluated with filter criteria.

  public double productMomentCorrelationCoefficient() {
    // calculate the correlation coefficient between prices observed on different exchanges
    PrefixIndex<ChunkedDoubleArray> exchange1Prices = ...
    PrefixIndex<ChunkedDoubleArray> exchange2Prices = ...
    SplitMap beforeClose = ...
    SplitMap afterOpen = ...
    QueryContext<Exchange, PriceQty> context = new QueryContext<>(
    Map.ofEntries(entry(BEFORE_CLOSE, beforeClose), entry(AFTER_OPEN, afterOpen), 
    Map.ofEntries(entry(NASDAQ, exchange1Prices), entry(LSE, exchange2Prices)))); 
    // evaluate product moment correlation coefficient 
    return Circuits.evaluate(context, slice -> slice.get(BEFORE_CLOSE).or(slice.get(AFTER_OPEN)), 
            BEFORE_CLOSE, AFTER_OPEN) 
            .stream()
            .parallel()
            .map(partition -> partition.reduce(SimpleLinearRegression.<Exchanges>reducer(exchange1Prices, exchange2Prices)))
            .collect(SimpleLinearRegression.pmcc());
  }

About

Parallel boolean circuit evaluation

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

Morty Proxy This is a proxified and sanitized view of the page, visit original site.