Load SQLite into a dataframe. The schema is read
for you: at runtime when you're exploring, or at compile time when you want type safety. You don't
write a persistent entity, a persistLowerCase block, or any instances.
This README is a runnable scripths notebook. Every Haskell block runs top-to-bottom in one shared session against
./data/chinook.db. Reproduce every output below withscripths docs/base_scripts/base_readme.md -o README.mdrun fromdataframe-persistent/.
| Tier | You write | You get |
|---|---|---|
| Runtime | readTable db "artists" |
a DataFrame, types inferred from the schema |
| Typed | $(declareTable db "artists") + readTableTyped @Schema |
a compile-time schema type; columns checked by col @"Name" |
| Persistent | $(declareEntity db "artists") |
a full persistent entity: typed Filter DSL, write-back |
This is the quickest way in, similar to pandas' read_sql_table / read_sql. You point at a file
and a table.
-- cabal: build-depends: dataframe, dataframe-persistent, text
-- cabal: default-extensions: OverloadedStrings, TemplateHaskell, DataKinds
-- cabal: default-extensions: TypeApplications, TypeOperators, FlexibleContexts
import qualified DataFrame as D
import DataFrame.IO.Persistent.Read
import Data.Function ((&))What tables are in this database?
listTables "./data/chinook.db"["albums","artists","customers","employees","genres","invoice_items","invoices","media_types","playlist_track","playlists","tracks"]
describeTable shows a table's columns and their inferred types before you load it (it returns a
describeColumns-style DataFrame):
D.toMarkdown' <$> describeTable "./data/chinook.db" "artists"
Column Name
TextType
TextSQLite Type
TextNullable
BoolPrimary Key
BoolArtistId Int INTEGER False True Name Maybe Text NVARCHAR(120) True False
Load the whole table. The column types (and nullability) come from the schema:
D.toMarkdown' . D.take 5 <$> readTable "./data/chinook.db" "artists"
ArtistId
IntName
Maybe Text1 Just "AC/DC" 2 Just "Accept" 3 Just "Aerosmith" 4 Just "Alanis Morissette" 5 Just "Alice In Chains"
Tables larger than memory? Filter them in the database. Raw SQL works as you'd expect:
D.toMarkdown' <$> readSql "./data/chinook.db" "SELECT * FROM artists WHERE Name LIKE 'A%' LIMIT 5"
ArtistId
IntName
Text1 AC/DC 2 Accept 3 Aerosmith 4 Alanis Morissette 5 Alice In Chains
Or push a WHERE / LIMIT down to a named table with a ReadQuery value:
D.toMarkdown' <$> readTableWith "./data/chinook.db" "artists" (allRows & limit 3)
ArtistId
IntName
Maybe Text1 Just "AC/DC" 2 Just "Accept" 3 Just "Aerosmith"
declareTable reads the schema at compile time and emits just the schema type. You read into it
with readTableTyped, where the schema is a type argument and the database and table are ordinary
values. Column references go through col @"Name", checked against the schema, so a typo or a wrong
type is a compile error. Nothing is keyed on a generated function name.
import qualified DataFrame.Typed as DT
import DataFrame.Typed ((.==.))
import DataFrame.IO.Persistent.Schema (declareTable)The splice brings one thing into scope, the type
type ArtistsSchema = '[Column "ArtistId" Int, Column "Name" (Maybe Text)]:
$(declareTable "./data/chinook.db" "artists")readTableTyped @ArtistsSchema reads any database/table into a TypedDataFrame ArtistsSchema (it
validates the schema as it reads). You can bind your own reader:
artists = readTableTyped @ArtistsSchema "./data/chinook.db" "artists"thaw drops a typed frame back to an untyped DataFrame:
D.toMarkdown' . D.take 5 . DT.thaw <$> artists
ArtistId
IntName
Maybe Text1 Just "AC/DC" 2 Just "Accept" 3 Just "Aerosmith" 4 Just "Alanis Morissette" 5 Just "Alice In Chains"
Column access is checked against the schema. col @"Name" only compiles because "Name" is a
column of ArtistsSchema (its element type is Maybe Text):
DT.columnAsList @"Name" . DT.take 3 <$> artists[Just "AC/DC",Just "Accept",Just "Aerosmith"]
A filter on a column that doesn't exist (or has the wrong type) is a compile error rather than a runtime surprise:
D.toMarkdown' . DT.thaw . DT.filterWhere (DT.col @"Name" .==. DT.lit (Just "Accept")) <$> artists
ArtistId
IntName
Maybe Text2 Just "Accept"
Because the database is a value, reading the same table from two sources to join them is just two
calls with the same @ArtistsSchema:
a <- readTableTyped @ArtistsSchema "europe.sqlite" "artists"
b <- readTableTyped @ArtistsSchema "us.sqlite" "artists"
-- DT.thaw a / DT.thaw b, then DataFrame.innerJoin on "ArtistId", etc.
(readSqlTyped @cols db "SELECT ... JOIN ..." does the same for an arbitrary query. The Postgres
section below reads this same ArtistsSchema from a different backend.)
If you want the full persistent experience (the typed Filter DSL, relations, write-back),
declareEntity builds the entity from the live schema, so you skip the persistLowerCase block. It
needs the usual persistent extensions and unqualified Text / Int64 in scope:
-- cabal: build-depends: persistent, persistent-sqlite
-- cabal: default-extensions: GADTs, TypeFamilies, FlexibleInstances, MultiParamTypeClasses
-- cabal: default-extensions: StandaloneDeriving, DerivingStrategies, GeneralizedNewtypeDeriving
-- cabal: default-extensions: UndecidableInstances
import Data.Int (Int64)
import Data.Text (Text)
import Database.Persist (Filter, SelectOpt (..), (<-.), (==.))
import Database.Persist.Sqlite (runSqlite)
import DataFrame.IO.Persistent.Schema (declareEntity)This generates the Albums entity (Id, title, artistId), bound to the real
albums / AlbumId / Title / ArtistId SQLite names, plus its EntityField constructors:
$(declareEntity "./data/chinook.db" "albums")Now load it with selectToDataFrame. It's a generic loader: it works for any persistent entity,
so the one declareEntity splice is all the boilerplate.
selectToDataFrame
:: (MonadIO m, PersistEntity r, PersistEntityBackend r ~ SqlBackend)
=> [Filter r] -- persistent's typed WHERE DSL, e.g. [AlbumsArtistId ==. 1]
-> [SelectOpt r] -- ordering / paging, e.g. [Asc AlbumsTitle, LimitTo 10]
-> ReaderT SqlBackend m DataFrame
It runs persistent's selectList filters opts and turns the [Entity Albums] into a DataFrame:
the entity key becomes an id column and each field becomes a column (named by its
Haskell field, e.g. title, artistId), with element types decoded from the stored values. The
[Filter Albums] and [SelectOpt Albums] you pass are persistent's own, checked against the
entity, so a filter for the wrong field won't compile.
Find AC/DC's albums (ArtistId 1):
D.toMarkdown' <$> runSqlite "./data/chinook.db" (selectToDataFrame [AlbumsArtistId ==. 1] [])
id
Inttitle
TextartistId
Int1 For Those About To Rock We Salute You 1 4 Let There Be Rock 1
You can use both arguments together: filter and order/page in one query. Here are the albums by
AC/DC or Alanis Morissette (ArtistId 1 or 4), sorted by title, capped at 5. (<-. is persistent's
"field in list"; an empty filter list loads everything.)
D.toMarkdown' <$> runSqlite "./data/chinook.db"
(selectToDataFrame [AlbumsArtistId <-. [1, 4]] [Asc AlbumsTitle, LimitTo 5])
id
Inttitle
TextartistId
Int1 For Those About To Rock We Salute You 1 6 Jagged Little Pill 4 4 Let There Be Rock 1
The ...Conn readers run on a ReaderT SqlBackend m, so a raw read, an entity read, and ordinary
persistent queries all compose in one runSqlite transaction (this works on Postgres/MySQL too).
Here both reads share one connection and return each frame's (rows, columns):
runSqlite "./data/chinook.db" $ do
artists <- readTableConn "artists" -- raw → DataFrame
albums <- selectToDataFrame ([] :: [Filter Albums]) [] -- entity → DataFrame
pure (D.dimensions artists, D.dimensions albums)((275,2),(347,3))
Everything except the file-path helpers and the compile-time splices is backend-agnostic: the
...Conn readers and selectToDataFrame run on any persistent SqlBackend. Introspection picks
PRAGMA for SQLite and information_schema for PostgreSQL/MySQL. The library doesn't depend on
persistent-postgresql; you bring the connection.
These examples run against a real PostgreSQL with the same artists / albums data, and the
notebook starts it itself. The block below uses GHCi's :! shell escape to spin up a throwaway
cluster on port 54329 and load the two tables from the SQLite fixture. It needs postgres and
sqlite3 on PATH (e.g. brew install postgresql@16). A matching teardown runs at the end of the
section.
:! bash scripts/pg-setup.shpostgres ready on port 54329 (db chinook: artists + albums)
Open a connection (you supply persistent-postgresql; runPg is the usual runSqlConn wrapper):
-- cabal: build-depends: persistent, persistent-postgresql, transformers, resourcet, monad-logger
import Control.Monad.Trans.Reader (ReaderT)
import Control.Monad.Trans.Resource (ResourceT, runResourceT)
import Control.Monad.Logger (NoLoggingT, runNoLoggingT)
import Database.Persist.Sql (SqlBackend)
import Database.Persist.Postgresql (runSqlConn, withPostgresqlConn)
runPg :: ReaderT SqlBackend (ResourceT (NoLoggingT IO)) a -> IO a
runPg act =
runNoLoggingT (runResourceT (withPostgresqlConn "host=localhost port=54329 dbname=chinook user=postgres" (runSqlConn act)))Discovery and reads use the same functions as SQLite, just ...Conn wrapped in runPg:
runPg listTablesConn["albums","artists"]
D.toMarkdown' <$> runPg (describeTableConn "artists")
Column Name
TextType
TextSQLite Type
TextNullable
BoolPrimary Key
BoolArtistId Int integer False True Name Maybe Text text True False
D.toMarkdown' . D.take 5 <$> runPg (readTableConn "artists")
ArtistId
IntName
Maybe Text1 Just "AC/DC" 2 Just "Accept" 3 Just "Aerosmith" 4 Just "Alanis Morissette" 5 Just "Alice In Chains"
The typed reader is backend-agnostic too. The same ArtistsSchema generated from the SQLite file in
Tier 1 validates this PostgreSQL read, so it's one schema type across two databases:
D.toMarkdown' . D.take 3 . DT.thaw <$> runPg (readTableTypedConn @ArtistsSchema "artists")
ArtistId
IntName
Maybe Text1 Just "AC/DC" 2 Just "Accept" 3 Just "Aerosmith"
The Albums entity generated from SQLite at compile time is a plain persistent entity, so the
same declareEntity splice and selectToDataFrame run unchanged against PostgreSQL:
D.toMarkdown' <$> runPg (selectToDataFrame [AlbumsArtistId <-. [1, 4]] [Asc AlbumsTitle, LimitTo 5])
id
Inttitle
TextartistId
Int1 For Those About To Rock We Salute You 1 6 Jagged Little Pill 4 4 Let There Be Rock 1
Tear the throwaway cluster back down (again via :!):
:! bash scripts/pg-teardown.shpostgres stopped and removed
SQLite type affinities map to Haskell element types as below. NOT NULL columns are non-null,
everything else becomes Maybe. You can override a column with typeOverride in DeclareOptions
(declareTableWith / declareEntityWith).
| SQLite declared type | Haskell type |
|---|---|
INTEGER, INT... |
Int |
REAL, FLOAT, DOUBLE, NUMERIC, DECIMAL |
Double |
TEXT, VARCHAR, CHAR, CLOB |
Text |
BOOLEAN |
Bool |
BLOB / no declared type |
ByteString |
DATE |
Day |
DATETIME, TIMESTAMP |
UTCTime |
TIME |
TimeOfDay |
build-depends: dataframe, dataframe-persistent, persistent-sqlite
The classic path still works and is unchanged: hand-write a persistent entity with
share/persistLowerCase and derivePersistentDataFrame, then load with fromPersistent.