DataFrame

User guide | Discord

DataFrame

A fast, safe, and intuitive DataFrame library.

Why use this DataFrame library?

Encourages concise, declarative, and composable data pipelines.
Static typing makes code easier to reason about and catches many bugs at compile time—before your code ever runs.
Delivers high performance thanks to Haskell’s optimizing compiler and efficient memory model.
Designed for interactivity: expressive syntax, helpful error messages, and sensible defaults.
Works seamlessly in both command-line and notebook environments—great for exploration and scripting alike.

Example usage

Interactive environment

Key features in example:

Intuitive, SQL-like API to get from data to insights.
Create typed, completion-ready references to columns in a dataframe using :exposeColumns
Type-safe column transformations for faster and safer exploration.
Fluid, chaining API that makes code easy to reason about.

Standalone script example

-- Useful Haskell extensions.
{-# LANGUAGE OverloadedStrings #-} -- Allow string literal to be interpreted as any other string type.
{-# LANGUAGE TypeApplications #-} -- Convenience syntax for specifiying the type `sum a b :: Int` vs `sum @Int a b'. 

import qualified DataFrame as D -- import for general functionality.
import qualified DataFrame.Functions as F -- import for column expressions.

import DataFrame ((|>)) -- import chaining operator with unqualified.

main :: IO ()
main = do
    df <- D.readTsv "./data/chipotle.tsv"
    let quantity = F.col "quantity" :: D.Expr Int -- A typed reference to a column.
    print (df
      |> D.select ["item_name", "quantity"]
      |> D.groupBy ["item_name"]
      |> D.aggregate [ (F.sum quantity)     `F.as` "sum_quantity"
                     , (F.mean quantity)    `F.as` "mean_quantity"
                     , (F.maximum quantity) `F.as` "maximum_quantity"
                     ]
      |> D.sortBy D.Descending ["sum_quantity"]
      |> D.take 10)

Output:

------------------------------------------------------------------------------------------
index |          item_name           | sum_quantity |    mean_quanity    | maximum_quanity
------|------------------------------|--------------|--------------------|----------------
 Int  |             Text             |     Int      |       Double       |       Int      
------|------------------------------|--------------|--------------------|----------------
0     | Chicken Bowl                 | 761          | 1.0482093663911847 | 3              
1     | Chicken Burrito              | 591          | 1.0687160940325497 | 4              
2     | Chips and Guacamole          | 506          | 1.0563674321503131 | 4              
3     | Steak Burrito                | 386          | 1.048913043478261  | 3              
4     | Canned Soft Drink            | 351          | 1.1661129568106312 | 4              
5     | Chips                        | 230          | 1.0900473933649288 | 3              
6     | Steak Bowl                   | 221          | 1.04739336492891   | 3              
7     | Bottled Water                | 211          | 1.3024691358024691 | 10             
8     | Chips and Fresh Tomato Salsa | 130          | 1.1818181818181819 | 15             
9     | Canned Soda                  | 126          | 1.2115384615384615 | 4

Full example in ./examples folder using many of the constructs in the API.

Installing

Jupyter notebook

We have a hosted version of the Jupyter notebook on azure sites. This is hosted on Azure's free tier so it can only support 3 or 4 kernels at a time.
To get started quickly, use the Dockerfile in the ihaskell-dataframe to build and run an image with dataframe integration.
For a preview check out the California Housing notebook.

CLI

Run the installation script curl '=https' --tlsv1.2 -sSf https://raw.githubusercontent.com/mchav/dataframe/refs/heads/main/scripts/install.sh | sh
Download the run script with: curl --output dataframe "https://raw.githubusercontent.com/mchav/dataframe/refs/heads/main/scripts/dataframe.sh"
Make the script executable: chmod +x dataframe
Add the script your path: export PATH=$PATH:./dataframe
Run the script with: dataframe

What is exploratory data analysis?

We provide a primer here and show how to do some common analyses.

Coming from other dataframe libraries

Familiar with another dataframe library? Get started:

Supported input formats

CSV
Apache Parquet

Supported output formats

CSV

Future work

Apache arrow compatability
Integration with common data formats (currently only supports CSV)
Support windowed plotting (currently only supports ASCII plots)
Host the whole library + Jupyter lab on Azure with auth and isolation.

Name	Name	Last commit message	Last commit date
Latest commit History 622 Commits 622 Commits
.github	.github
app	app
benchmark	benchmark
data	data
docs	docs
examples	examples
scripts	scripts
src	src
static	static
tests	tests
.ghci	.ghci
.gitignore	.gitignore
.readthedocs.yaml	.readthedocs.yaml
CHANGELOG.md	CHANGELOG.md
CODE_OF_CONDUCT.md	CODE_OF_CONDUCT.md
CONTRIBUTING.md	CONTRIBUTING.md
LICENSE	LICENSE
README.md	README.md
SECURITY.md	SECURITY.md
dataframe.cabal	dataframe.cabal
dataframe.ghci	dataframe.ghci
flake.nix	flake.nix
fourmolu.yaml	fourmolu.yaml
set_hasktorch_env	set_hasktorch_env
test_coverage.md	test_coverage.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DataFrame

Why use this DataFrame library?

Example usage

Interactive environment

Standalone script example

Installing

Jupyter notebook

CLI

What is exploratory data analysis?

Coming from other dataframe libraries

Supported input formats

Supported output formats

Future work

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Search code, repositories, users, issues, pull requests...

Folders and files

Latest commit

History

Repository files navigation

DataFrame

Why use this DataFrame library?

Example usage

Interactive environment

Standalone script example

Installing

Jupyter notebook

CLI

What is exploratory data analysis?

Coming from other dataframe libraries

Supported input formats

Supported output formats

Future work

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages