Deleting large S3 files using AWS Lambda & Amazon S3 Expiration policies reliably
Deleting larger objects in AWS S3 can be cumbersome and sometimes demands repetition and retries possibly due to the nature of large size (& more number) of the files. In one such scenario is handled today using hourly EMR job operations (one master and multi nodes) on these S3 files. Due to the nature of large size/number of files, the process runs for longer hours and occasionally had to be retried due to failures.
To mitigate the deletion process more reliably and cheaper, we leveraged shorter compute execution using AWS Lambda in combination with AWS S3 lifecycle policies. The lambda can be scheduled using an CloudWatch Event rule (or using AWS StepFunctions or Apache Airflow etc.,). In this situation the SLA for the deletion of the files can be upto 48 hours.
This example provides infrastructure and sample Java code (for the Lambda) to delete the s3 files using lifecycle policy leveraging object expiration techniques
Please refer https://github.com/shivaramani/lambda-s3-lifecycle-deletion-process for more details