This repository contains an AWS SAM application that utilizes a Python-based AWS Lambda function(Image/ECR) to process CSV files stored in Amazon S3. A key feature of this application is its use of DuckDB for efficient in-memory data exploration and analytics.
- The application is written in Python and runs on AWS Lambda.
- It uses DuckDB to query and analyze CSV data from your S3 bucket.
- You can build and deploy it using the AWS SAM CLI.
- AWS CLI installed and configured.
- AWS SAM CLI installed.
- A CSV file uploaded to your S3 bucket.
- Upload the CSV File
Upload your CSV file to an S3 bucket of your choice. For example:
s3://YOUR-BUCKET/FILENAME.csv - Modify
template.ymlIn thetemplate.ymlfile, locate the parameterS3URIand update it to match your bucket and file name. For example:S3URI: s3://YOUR-BUCKET/FILENAME.csv
The following commands will build and deploy the application using the AWS SAM CLI.
-
Build the Application
$ sam build -
Deploy the Application
$ sam deploy --guidedOnce the deployment is complete, SAM will provide outputs such as the Lambda function ARN.
-
Invoke Lambda Function
$ aws lambda invoke \ --function-name arn:aws:lambda:ap-northeast-1:123456789012:function:duck-DuckDBLambdaFunction \ --payload '{}' \ response.json { "StatusCode": 200, "ExecutedVersion": "$LATEST" } $ cat response.json ... SQL OUTPUT GOES HERE...
- AWS Lambda does not set a
HOMEenvironment variable by default. - DuckDB installs extensions inside the HOME directory.
- Therefore, to ensure successful extension installation, we set
HOMEto/tmpwithin the Lambda function’s environment variables. - Otherwise, you'll encounter errors as follows duckdb/duckdb #3855:
D INSTALL httpfs; IO Error: Can't find the home directory at '' Specify a home directory using the SET home_directory='/path/to/dir' option. ... D CREATE SECRET ( TYPE S3, PROVIDER CREDENTIAL_CHAIN ); Extension Autoloading Error: An error occurred while trying to automatically install the required extension 'aws': Can't find the home directory at '' Specify a home directory using the SET home_directory='/path/to/dir' option.