Categories
AWS

Amazon S3 Expiration

Deleting large S3 files using AWS Lambda & Amazon S3 Expiration policies reliably

Deleting larger objects in AWS S3 can be cumbersome and sometimes demands repetition and retries possibly due to the nature of large size (& more number) of the files. In one such scenario is handled today using hourly EMR job operations (one master and multi nodes) on these S3 files. Due to the nature of large size/number of files, the process runs for longer hours and occasionally had to be retried due to failures.

To mitigate the deletion process more reliably and cheaper, we leveraged shorter compute execution using AWS Lambda in combination with AWS S3 lifecycle policies. The lambda can be scheduled using an CloudWatch Event rule (or using AWS StepFunctions or Apache Airflow etc.,). In this situation the SLA for the deletion of the files can be upto 48 hours.

This example provides infrastructure and sample Java code (for the Lambda) to delete the s3 files using lifecycle policy leveraging object expiration techniques

Please refer https://github.com/shivaramani/lambda-s3-lifecycle-deletion-process for more details

By shivaramani

I am passionate developer, application architect. Like to try different programming languages, especially Microsoft workloads in AWS. Enjoys learning about different AWS services and focuses on developer experience for building applications using compute, containers & serverless architecture.

I like to build experience around software development(SDLC), infrastructure as code, automating and providing a CI/CD solution

Leave a Reply

Your email address will not be published. Required fields are marked *