dataframe-persistent

Load SQLite into a dataframe. The schema is read for you: at runtime when you're exploring, or at compile time when you want type safety. You don't write a persistent entity, a persistLowerCase block, or any instances.

This README is a runnable scripths notebook. Every Haskell block runs top-to-bottom in one shared session against ./data/chinook.db. Reproduce every output below with scripths docs/base_scripts/base_readme.md -o README.md run from dataframe-persistent/.

Three ways in

Tier	You write	You get
Runtime	`readTable db "artists"`	a `DataFrame`, types inferred from the schema
Typed	`$(declareTable db "artists")` + `readTableTyped @Schema`	a compile-time schema type; columns checked by `col @"Name"`
Persistent	`$(declareEntity db "artists")`	a full `persistent` entity: typed `Filter` DSL, write-back

Tier 0: runtime reads

This is the quickest way in, similar to pandas' read_sql_table / read_sql. You point at a file and a table.

-- cabal: build-depends: dataframe, dataframe-persistent, text
-- cabal: default-extensions: OverloadedStrings, TemplateHaskell, DataKinds
-- cabal: default-extensions: TypeApplications, TypeOperators, FlexibleContexts
import qualified DataFrame as D
import DataFrame.IO.Persistent.Read
import Data.Function ((&))

What tables are in this database?

listTables "./data/chinook.db"

["albums","artists","customers","employees","genres","invoice_items","invoices","media_types","playlist_track","playlists","tracks"]

describeTable shows a table's columns and their inferred types before you load it (it returns a describeColumns-style DataFrame):

D.toMarkdown' <$> describeTable "./data/chinook.db" "artists"

Column Name
Text Type
Text SQLite Type
Text Nullable
Bool Primary Key
Bool

ArtistId Int INTEGER False True

Name Maybe Text NVARCHAR(120) True False

Load the whole table. The column types (and nullability) come from the schema:

D.toMarkdown' . D.take 5 <$> readTable "./data/chinook.db" "artists"

ArtistId
Int Name
Maybe Text

1 Just "AC/DC"

2 Just "Accept"

3 Just "Aerosmith"

4 Just "Alanis Morissette"

5 Just "Alice In Chains"

Tables larger than memory? Filter them in the database. Raw SQL works as you'd expect:

D.toMarkdown' <$> readSql "./data/chinook.db" "SELECT * FROM artists WHERE Name LIKE 'A%' LIMIT 5"

ArtistId
Int Name
Text

1 AC/DC

2 Accept

3 Aerosmith

4 Alanis Morissette

5 Alice In Chains

Or push a WHERE / LIMIT down to a named table with a ReadQuery value:

D.toMarkdown' <$> readTableWith "./data/chinook.db" "artists" (allRows & limit 3)

ArtistId
Int Name
Maybe Text

1 Just "AC/DC"

2 Just "Accept"

3 Just "Aerosmith"

Tier 1: typed schema

declareTable reads the schema at compile time and emits just the schema type. You read into it with readTableTyped, where the schema is a type argument and the database and table are ordinary values. Column references go through col @"Name", checked against the schema, so a typo or a wrong type is a compile error. Nothing is keyed on a generated function name.

import qualified DataFrame.Typed as DT
import DataFrame.Typed ((.==.))
import DataFrame.IO.Persistent.Schema (declareTable)

The splice brings one thing into scope, the type type ArtistsSchema = '[Column "ArtistId" Int, Column "Name" (Maybe Text)]:

$(declareTable "./data/chinook.db" "artists")

readTableTyped @ArtistsSchema reads any database/table into a TypedDataFrame ArtistsSchema (it validates the schema as it reads). You can bind your own reader:

artists = readTableTyped @ArtistsSchema "./data/chinook.db" "artists"

thaw drops a typed frame back to an untyped DataFrame:

D.toMarkdown' . D.take 5 . DT.thaw <$> artists

ArtistId
Int Name
Maybe Text

1 Just "AC/DC"

2 Just "Accept"

3 Just "Aerosmith"

4 Just "Alanis Morissette"

5 Just "Alice In Chains"

Column access is checked against the schema. col @"Name" only compiles because "Name" is a column of ArtistsSchema (its element type is Maybe Text):

DT.columnAsList @"Name" . DT.take 3 <$> artists

[Just "AC/DC",Just "Accept",Just "Aerosmith"]

A filter on a column that doesn't exist (or has the wrong type) is a compile error rather than a runtime surprise:

D.toMarkdown' . DT.thaw . DT.filterWhere (DT.col @"Name" .==. DT.lit (Just "Accept")) <$> artists

ArtistId
Int Name
Maybe Text

2 Just "Accept"

Because the database is a value, reading the same table from two sources to join them is just two calls with the same @ArtistsSchema:

a <- readTableTyped @ArtistsSchema "europe.sqlite" "artists"
b <- readTableTyped @ArtistsSchema "us.sqlite"     "artists"
-- DT.thaw a / DT.thaw b, then DataFrame.innerJoin on "ArtistId", etc.

(readSqlTyped @cols db "SELECT ... JOIN ..." does the same for an arbitrary query. The Postgres section below reads this same ArtistsSchema from a different backend.)

Tier 2: generate a `persistent` entity

If you want the full persistent experience (the typed Filter DSL, relations, write-back), declareEntity builds the entity from the live schema, so you skip the persistLowerCase block. It needs the usual persistent extensions and unqualified Text / Int64 in scope:

-- cabal: build-depends: persistent, persistent-sqlite
-- cabal: default-extensions: GADTs, TypeFamilies, FlexibleInstances, MultiParamTypeClasses
-- cabal: default-extensions: StandaloneDeriving, DerivingStrategies, GeneralizedNewtypeDeriving
-- cabal: default-extensions: UndecidableInstances
import Data.Int (Int64)
import Data.Text (Text)
import Database.Persist (Filter, SelectOpt (..), (<-.), (==.))
import Database.Persist.Sqlite (runSqlite)
import DataFrame.IO.Persistent.Schema (declareEntity)

This generates the Albums entity (Id, title, artistId), bound to the real albums / AlbumId / Title / ArtistId SQLite names, plus its EntityField constructors:

$(declareEntity "./data/chinook.db" "albums")

Now load it with selectToDataFrame. It's a generic loader: it works for any persistent entity, so the one declareEntity splice is all the boilerplate.

selectToDataFrame
    :: (MonadIO m, PersistEntity r, PersistEntityBackend r ~ SqlBackend)
    => [Filter r]      -- persistent's typed WHERE DSL, e.g. [AlbumsArtistId ==. 1]
    -> [SelectOpt r]   -- ordering / paging,        e.g. [Asc AlbumsTitle, LimitTo 10]
    -> ReaderT SqlBackend m DataFrame

It runs persistent's selectList filters opts and turns the [Entity Albums] into a DataFrame: the entity key becomes an id column and each field becomes a column (named by its Haskell field, e.g. title, artistId), with element types decoded from the stored values. The [Filter Albums] and [SelectOpt Albums] you pass are persistent's own, checked against the entity, so a filter for the wrong field won't compile.

Find AC/DC's albums (ArtistId 1):

D.toMarkdown' <$> runSqlite "./data/chinook.db" (selectToDataFrame [AlbumsArtistId ==. 1] [])

id
Int title
Text artistId
Int

1 For Those About To Rock We Salute You 1

4 Let There Be Rock 1

You can use both arguments together: filter and order/page in one query. Here are the albums by AC/DC or Alanis Morissette (ArtistId 1 or 4), sorted by title, capped at 5. (<-. is persistent's "field in list"; an empty filter list loads everything.)

D.toMarkdown' <$> runSqlite "./data/chinook.db"
    (selectToDataFrame [AlbumsArtistId <-. [1, 4]] [Asc AlbumsTitle, LimitTo 5])

id
Int title
Text artistId
Int

1 For Those About To Rock We Salute You 1

6 Jagged Little Pill 4

4 Let There Be Rock 1

Hand-off between dataframe and persistent

The ...Conn readers run on a ReaderT SqlBackend m, so a raw read, an entity read, and ordinary persistent queries all compose in one runSqlite transaction (this works on Postgres/MySQL too). Here both reads share one connection and return each frame's (rows, columns):

runSqlite "./data/chinook.db" $ do
    artists <- readTableConn "artists"                       -- raw → DataFrame
    albums  <- selectToDataFrame ([] :: [Filter Albums]) []  -- entity → DataFrame
    pure (D.dimensions artists, D.dimensions albums)

((275,2),(347,3))

PostgreSQL (and other backends)

Everything except the file-path helpers and the compile-time splices is backend-agnostic: the ...Conn readers and selectToDataFrame run on any persistent SqlBackend. Introspection picks PRAGMA for SQLite and information_schema for PostgreSQL/MySQL. The library doesn't depend on persistent-postgresql; you bring the connection.

These examples run against a real PostgreSQL with the same artists / albums data, and the notebook starts it itself. The block below uses GHCi's :! shell escape to spin up a throwaway cluster on port 54329 and load the two tables from the SQLite fixture. It needs postgres and sqlite3 on PATH (e.g. brew install postgresql@16). A matching teardown runs at the end of the section.

:! bash scripts/pg-setup.sh

postgres ready on port 54329 (db chinook: artists + albums)

Open a connection (you supply persistent-postgresql; runPg is the usual runSqlConn wrapper):

-- cabal: build-depends: persistent, persistent-postgresql, transformers, resourcet, monad-logger
import Control.Monad.Trans.Reader (ReaderT)
import Control.Monad.Trans.Resource (ResourceT, runResourceT)
import Control.Monad.Logger (NoLoggingT, runNoLoggingT)
import Database.Persist.Sql (SqlBackend)
import Database.Persist.Postgresql (runSqlConn, withPostgresqlConn)

runPg :: ReaderT SqlBackend (ResourceT (NoLoggingT IO)) a -> IO a
runPg act =
    runNoLoggingT (runResourceT (withPostgresqlConn "host=localhost port=54329 dbname=chinook user=postgres" (runSqlConn act)))

Discovery and reads use the same functions as SQLite, just ...Conn wrapped in runPg:

runPg listTablesConn

["albums","artists"]

D.toMarkdown' <$> runPg (describeTableConn "artists")

Column Name
Text Type
Text SQLite Type
Text Nullable
Bool Primary Key
Bool

ArtistId Int integer False True

Name Maybe Text text True False

D.toMarkdown' . D.take 5 <$> runPg (readTableConn "artists")

ArtistId
Int Name
Maybe Text

1 Just "AC/DC"

2 Just "Accept"

3 Just "Aerosmith"

4 Just "Alanis Morissette"

5 Just "Alice In Chains"

The typed reader is backend-agnostic too. The same ArtistsSchema generated from the SQLite file in Tier 1 validates this PostgreSQL read, so it's one schema type across two databases:

D.toMarkdown' . D.take 3 . DT.thaw <$> runPg (readTableTypedConn @ArtistsSchema "artists")

ArtistId
Int Name
Maybe Text

1 Just "AC/DC"

2 Just "Accept"

3 Just "Aerosmith"

The Albums entity generated from SQLite at compile time is a plain persistent entity, so the same declareEntity splice and selectToDataFrame run unchanged against PostgreSQL:

D.toMarkdown' <$> runPg (selectToDataFrame [AlbumsArtistId <-. [1, 4]] [Asc AlbumsTitle, LimitTo 5])

id
Int title
Text artistId
Int

1 For Those About To Rock We Salute You 1

6 Jagged Little Pill 4

4 Let There Be Rock 1

Tear the throwaway cluster back down (again via :!):

:! bash scripts/pg-teardown.sh

postgres stopped and removed

How types are inferred

SQLite type affinities map to Haskell element types as below. NOT NULL columns are non-null, everything else becomes Maybe. You can override a column with typeOverride in DeclareOptions (declareTableWith / declareEntityWith).

SQLite declared type	Haskell type
`INTEGER`, `INT...`	`Int`
`REAL`, `FLOAT`, `DOUBLE`, `NUMERIC`, `DECIMAL`	`Double`
`TEXT`, `VARCHAR`, `CHAR`, `CLOB`	`Text`
`BOOLEAN`	`Bool`
`BLOB` / no declared type	`ByteString`
`DATE`	`Day`
`DATETIME`, `TIMESTAMP`	`UTCTime`
`TIME`	`TimeOfDay`

Install

build-depends: dataframe, dataframe-persistent, persistent-sqlite

The classic path still works and is unchanged: hand-write a persistent entity with share/persistLowerCase and derivePersistentDataFrame, then load with fromPersistent.

Name	Name	Last commit message	Last commit date
parent directory ..
data	data
docs/base_scripts	docs/base_scripts
scripts	scripts
src/DataFrame/IO	src/DataFrame/IO
tests	tests
CHANGELOG.md	CHANGELOG.md
LICENSE	LICENSE
README.md	README.md
dataframe-persistent.cabal	dataframe-persistent.cabal

id Int	title Text	artistId Int
1	For Those About To Rock We Salute You	1
4	Let There Be Rock	1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expand file tree

README.md

dataframe-persistent

Three ways in

Tier 0: runtime reads

Tier 1: typed schema

Tier 2: generate a `persistent` entity

Hand-off between dataframe and persistent

PostgreSQL (and other backends)

How types are inferred

Install

Column Name Text	Type Text	SQLite Type Text	Nullable Bool	Primary Key Bool
ArtistId	Int	INTEGER	False	True
Name	Maybe Text	NVARCHAR(120)	True	False

ArtistId Int	Name Maybe Text
1	Just "AC/DC"
2	Just "Accept"
3	Just "Aerosmith"
4	Just "Alanis Morissette"
5	Just "Alice In Chains"

id Int	title Text	artistId Int
1	For Those About To Rock We Salute You	1
6	Jagged Little Pill	4
4	Let There Be Rock	1

Search code, repositories, users, issues, pull requests...

FilesExpand file tree

dataframe-persistent

Directory actions

More options

Directory actions

More options

Latest commit

History

dataframe-persistent

Folders and files

parent directory

README.md

dataframe-persistent

Three ways in

Tier 0: runtime reads

Tier 1: typed schema

Tier 2: generate a persistent entity

Hand-off between dataframe and persistent

PostgreSQL (and other backends)

How types are inferred

Install

Expand file tree

Tier 2: generate a `persistent` entity