Categories
AWS

Run a data processing job on Amazon EMR Serverless with AWS Step Functions

Update Feb 2023: AWS Step Functions adds direct integration for 35 services including Amazon EMR Serverless. In the current version of this blog, we are using an AWS Lambda function to submit the job to EMR Serverless. Now, you can submit an EMR Serverless job by invoking the APIs directly from a Step Functions workflow. Read more about this here.

There are several infrastructure as code (IaC) frameworks available today, to help you define your infrastructure, such as the AWS Cloud Development Kit (AWS CDK) or Terraform by HashiCorp. Terraform, an AWS Partner Network (APN) Advanced Technology Partner and member of the AWS DevOps Competency, is an IaC tool similar to AWS CloudFormation that allows you to create, update, and version your AWS infrastructure. Terraform provides friendly syntax (similar to AWS CloudFormation) along with other features like planning (visibility to see the changes before they actually happen), graphing, and the ability to create templates to break infrastructure configurations into smaller chunks, which allows better maintenance and reusability. We use the capabilities and features of Terraform to build an API-based ingestion process into AWS. Let’s get started!

In this post, we showcase how to build and orchestrate a Scala Spark application using Amazon EMR ServerlessAWS Step Functions, and Terraform. In this end-to-end solution, we run a Spark job on EMR Serverless that processes sample clickstream data in an Amazon Simple Storage Service (Amazon S3) bucket and stores the aggregation results in Amazon S3.

With EMR Serverless, you don’t have to configure, optimize, secure, or operate clusters to run applications. You will continue to get the benefits of Amazon EMR, such as open source compatibility, concurrency, and optimized runtime performance for popular data frameworks. EMR Serverless is suitable for customers who want ease in operating applications using open-source frameworks. It offers quick job startup, automatic capacity management, and straightforward cost controls.

Please read here more on the blog

By shivaramani

I am passionate developer, application architect. Like to try different programming languages, especially Microsoft workloads in AWS. Enjoys learning about different AWS services and focuses on developer experience for building applications using compute, containers & serverless architecture.

I like to build experience around software development(SDLC), infrastructure as code, automating and providing a CI/CD solution

Leave a Reply

Your email address will not be published. Required fields are marked *