Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

quiver/duckdb-aws-lambda-python-sam-template

Open more actions menu

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AWS Lambda Python(Image/ECR) for CSV Processing with DuckDB

This repository contains an AWS SAM application that utilizes a Python-based AWS Lambda function(Image/ECR) to process CSV files stored in Amazon S3. A key feature of this application is its use of DuckDB for efficient in-memory data exploration and analytics.

Overview

  • The application is written in Python and runs on AWS Lambda.
  • It uses DuckDB to query and analyze CSV data from your S3 bucket.
  • You can build and deploy it using the AWS SAM CLI.

Prerequisites

  1. AWS CLI installed and configured.
  2. AWS SAM CLI installed.
  3. A CSV file uploaded to your S3 bucket.

Pre-Deployment Steps

  1. Upload the CSV File Upload your CSV file to an S3 bucket of your choice. For example:
    s3://YOUR-BUCKET/FILENAME.csv
    
  2. Modify template.yml In the template.yml file, locate the parameter S3URI and update it to match your bucket and file name. For example:
    S3URI: s3://YOUR-BUCKET/FILENAME.csv

Build and Deploy

The following commands will build and deploy the application using the AWS SAM CLI.

  1. Build the Application

    $ sam build
    
  2. Deploy the Application

    $ sam deploy --guided
    
    
    

    Once the deployment is complete, SAM will provide outputs such as the Lambda function ARN.

  3. Invoke Lambda Function

    $ aws lambda invoke \
     --function-name arn:aws:lambda:ap-northeast-1:123456789012:function:duck-DuckDBLambdaFunction \
     --payload '{}' \
     response.json
    {
     "StatusCode": 200,
     "ExecutedVersion": "$LATEST"
    }
    
    $ cat response.json
    
    ... SQL OUTPUT GOES HERE...
    

Important Notes on DuckDB and Lambda

  • AWS Lambda does not set a HOME environment variable by default.
  • DuckDB installs extensions inside the HOME directory.
  • Therefore, to ensure successful extension installation, we set HOME to /tmp within the Lambda function’s environment variables.
  • Otherwise, you'll encounter errors as follows duckdb/duckdb #3855:
    D INSTALL httpfs;
    IO Error: Can't find the home directory at ''
    Specify a home directory using the SET home_directory='/path/to/dir' option.
    
    ...
    
    D CREATE SECRET (
          TYPE S3,
          PROVIDER CREDENTIAL_CHAIN
      );
    Extension Autoloading Error: An error occurred while trying to automatically install the required extension 'aws':
    Can't find the home directory at ''
    Specify a home directory using the SET home_directory='/path/to/dir' option.
    

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Morty Proxy This is a proxified and sanitized view of the page, visit original site.