Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

oforero/dataframe

Open more actions menu
 
 

Repository files navigation

DataFrame

An intuitive, dynamically-typed DataFrame library.

A tool for exploratory data analysis.

Installing

CLI

  • Install Haskell (ghc + cabal) via ghcup selecting all the default options.
  • To install dataframe run cabal update && cabal install dataframe
  • Open a Haskell repl with dataframe loaded by running cabal repl --build-depends dataframe.
  • Follow along any one of the tutorials below.

Jupyter notebook

  • Jupyter notebook is still underway with some local tests/examples in the works.
  • For a preview check out the California Housing notebook.

What is exploratory data analysis?

We provide a primer here and show how to do some common analyses.

Coming from other dataframe libraries

Familiar with another dataframe library? Get started:

Example usage

Code example

import qualified DataFrame as D

import DataFrame ((|>))

main :: IO ()
    df <- D.readTsv "./data/chipotle.tsv"
    print $ df
      |> D.select ["item_name", "quantity"]
      |> D.groupBy ["item_name"]
      |> D.aggregate (zip (repeat "quantity") [D.Maximum, D.Mean, D.Sum])
      |> D.sortBy D.Descending ["Sum_quantity"]

Output:

----------------------------------------------------------------------------------------------------
index |               item_name               | Sum_quantity |   Mean_quantity    | Maximum_quantity
------|---------------------------------------|--------------|--------------------|-----------------
 Int  |                 Text                  |     Int      |       Double       |       Int       
------|---------------------------------------|--------------|--------------------|-----------------
0     | Chips and Fresh Tomato Salsa          | 130          | 1.1818181818181819 | 15              
1     | Izze                                  | 22           | 1.1                | 3               
2     | Nantucket Nectar                      | 31           | 1.1481481481481481 | 3               
3     | Chips and Tomatillo-Green Chili Salsa | 35           | 1.1290322580645162 | 3               
4     | Chicken Bowl                          | 761          | 1.0482093663911847 | 3               
5     | Side of Chips                         | 110          | 1.0891089108910892 | 8               
6     | Steak Burrito                         | 386          | 1.048913043478261  | 3               
7     | Steak Soft Tacos                      | 56           | 1.018181818181818  | 2               
8     | Chips and Guacamole                   | 506          | 1.0563674321503131 | 4               
9     | Chicken Crispy Tacos                  | 50           | 1.0638297872340425 | 2

Full example in ./app folder using many of the constructs in the API.

Visual example

Screencast of usage in GHCI

Future work

  • Jupyter/ihaskell support (soon)
  • Apache arrow and Parquet compatability
  • Integration with common data formats (currently only supports CSV)
  • Support windowed plotting (currently only supports ASCII plots)
  • Create a lazy API that builds an execution graph instead of running eagerly (will be used to compute on files larger than RAM)

Contributing

  • Please first submit an issue and we can discuss there.

About

An intuitive, dynamically-typed DataFrame library.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Haskell 98.7%
  • Other 1.3%
Morty Proxy This is a proxified and sanitized view of the page, visit original site.