Categories
AWS

Run a data processing job on Amazon EMR Serverless with AWS Step Functions

Update Feb 2023: AWS Step Functions adds direct integration for 35 services including Amazon EMR Serverless. In the current version of this blog, we are using an AWS Lambda function to submit the job to EMR Serverless. Now, you can submit an EMR Serverless job by invoking the APIs directly from a Step Functions workflow. Read more about this here.

There are several infrastructure as code (IaC) frameworks available today, to help you define your infrastructure, such as the AWS Cloud Development Kit (AWS CDK) or Terraform by HashiCorp. Terraform, an AWS Partner Network (APN) Advanced Technology Partner and member of the AWS DevOps Competency, is an IaC tool similar to AWS CloudFormation that allows you to create, update, and version your AWS infrastructure. Terraform provides friendly syntax (similar to AWS CloudFormation) along with other features like planning (visibility to see the changes before they actually happen), graphing, and the ability to create templates to break infrastructure configurations into smaller chunks, which allows better maintenance and reusability. We use the capabilities and features of Terraform to build an API-based ingestion process into AWS. Let’s get started!

In this post, we showcase how to build and orchestrate a Scala Spark application using Amazon EMR ServerlessAWS Step Functions, and Terraform. In this end-to-end solution, we run a Spark job on EMR Serverless that processes sample clickstream data in an Amazon Simple Storage Service (Amazon S3) bucket and stores the aggregation results in Amazon S3.

With EMR Serverless, you don’t have to configure, optimize, secure, or operate clusters to run applications. You will continue to get the benefits of Amazon EMR, such as open source compatibility, concurrency, and optimized runtime performance for popular data frameworks. EMR Serverless is suitable for customers who want ease in operating applications using open-source frameworks. It offers quick job startup, automatic capacity management, and straightforward cost controls.

Please read here more on the blog

Categories
AWS

Build and Deploy a Microsoft .NET Core Web API application to AWS App Runner using CloudFormation

Container workload management tasks, such as managing deployments, scaling infrastructure, or keeping it updated, can get cumbersome. AWS App Runner is a great alternative for customers without any prior containers or infrastructure experience, as it is a fully managed service that takes care of building and deploying your application, load balancing traffic, or autoscaling up or down per your application needs. App Runner retrieves your source code from GitHub or source image from Amazon ECR repository in your AWS account, and creates and maintains a running web service for you in the AWS Cloud.

In this blog we show you how to build a Microsoft.NET Web API application with Amazon Aurora Database using AWS App Runner. AWS App Runner makes it easy for developers to quickly deploy containerized web applications and APIs, and helps us start with our source code or a container image.

Read more here

Categories
AWS

AWS Batch Application Orchestration using AWS Fargate

Many customers prefer to use Docker images with AWS Batch and AWS Cloudformation for cost-effective and faster processing of complex jobs. To run batch workloads in the cloud, customers have to consider various orchestration needs, such as queueing workloads, submitting to a compute resource, prioritizing jobs, handling dependencies and retries, scaling compute, and tracking utilization and resource management. While AWS Batch simplifies all the queuing, scheduling, and lifecycle management for customers, and even provisions and manages compute in the customer account, customers continue to look for even more time-efficient and simpler workflows to get their application jobs up and running in minutes.

Read more here.

Categories
AWS

Running a Kubernetes Job in Amazon EKS on AWS Fargate Using AWS StepFunctions

In a previous AWS Blog, I shared an application orchestration process to run Amazon ECS Tasks using AWS Step Functions.This blog will be similar continuation but here we will be running the same application on Amazon EKS as a Kubernetes job on Fargate using StepFunctions.

Amazon EKS provides flexibility to develop many container use cases like long running jobs, web application, micro-services architecture, on-demand job execution, batch processing, machine learning applications with seamless integration in conjunction with other AWS services. Kubernetes is an open source container orchestration engine for automating deployment, scaling and management of containerized applications. The open source project is hosted by the Cloud Native Computing Foundation(CNCF).

Read more here.

Categories
AWS

An example of running Amazon ECS tasks on AWS Fargate – Terraform (By HashiCorp)

AWS Fargate is a a serverless compute engine that supports several common container use cases, like running micro-services architecture applications, batch processing, machine learning applications, and migrating on premise applications to the cloud without having to manage servers or clusters of Amazon EC2 instances. 

In this blog, we will walk you through a use case of running an Amazon ECS Task on AWS Fargate that can be initiated using AWS Step Functions. We will use Terraform to model the AWS infrastructure. The example solution leverages Amazon ECS a scalable, high performance container management service that supports Docker containers that are provisioned by Fargate to automatically scale, load balance, and manage scheduling of your containers for availability. For defining the infrastructure, you can use AWS CloudFormationAWS CDK or Terraform by HashiCorp. In the solution presented in this post, we use  Terraform by HashiCorp, an AWS Partner Network (APN) Advanced Technology Partner and member of the AWS DevOps Competency.

Read More here

Categories
AWS

AWS Certified Advanced Networking Specialty!

Excited to start this year with AWS Certified Advanced Networking Specialty!

As per a couplet in Tamil language, “katrathu kaiman alavu kallathathu ulagalavu”, a raw transcribe could be something like
“known is a droplet and unknown is an ocean”.

Started this year with this droplet in my chase of learning new!

Refer my notes and learnings @ https://www.cloudopsguru.com/#/Network

Thanks to

Categories
AWS

AWS Lambda PowerMockITO – mocking static methods & DI

Thought of writing an article about mocking static methods in Java (there are quite a few available in the web). Just plugged in the combination of DI using Dagger and PowerMockIto

An example implementation of uploading and downloading/get files from AWS S3. Save and read rows from DynamoDB. The example provides JUnit testing using Mockito & PowerMockITO

1. Two sample lambdas (save and get) - to S3 and DynamodB
2. Terraform implementation for deploying the lambda
3. JUnit for S3 and DynamoDB testing
4. Sample CLI commands to test

Read more https://github.com/shivaramani/aws-lambda-mockito

Categories
AWS

Amazon S3 Expiration

Deleting large S3 files using AWS Lambda & Amazon S3 Expiration policies reliably

Deleting larger objects in AWS S3 can be cumbersome and sometimes demands repetition and retries possibly due to the nature of large size (& more number) of the files. In one such scenario is handled today using hourly EMR job operations (one master and multi nodes) on these S3 files. Due to the nature of large size/number of files, the process runs for longer hours and occasionally had to be retried due to failures.

To mitigate the deletion process more reliably and cheaper, we leveraged shorter compute execution using AWS Lambda in combination with AWS S3 lifecycle policies. The lambda can be scheduled using an CloudWatch Event rule (or using AWS StepFunctions or Apache Airflow etc.,). In this situation the SLA for the deletion of the files can be upto 48 hours.

This example provides infrastructure and sample Java code (for the Lambda) to delete the s3 files using lifecycle policy leveraging object expiration techniques

Please refer https://github.com/shivaramani/lambda-s3-lifecycle-deletion-process for more details

Categories
AWS

AWS Lambda Alias Routing

Lambda Version Alias

This example provides house alias routing can be used in API Gateway and in Kinesis Streams (lambda event mapping to alias)

https://github.com/shivaramani/lambdaversion

High level Steps

  • The solution has “src” and “templates”. src has the java sample lambda. templates have the .tf files.
  • Upon “mvn clean package” “target” folder will have the jar generated.
  • “exec.sh” has the aws cli/bash commands to deploy the lambda function and modify the alias accordingly
  1. Initial Setup
    • Build java source
    • Run terraform templates to create API Gateway/Lambda. This pushes the above jar to the lambda
    • Created API points to lambda:active alias. “red” and “black” aliases are also created at this point
  2. Modify the code
    • Run the “exec.sh”. This is similar to GIT pipelines deploying the function
    • At this point the the cli sets current version as red and new code being deployed as black
    • cli sets the routing pointing to black and additional routing config weightage for the fallback/red
Categories
AWS

AWS Certified Data Analytics – Specialty

whoooh.. Knocked one more certification. real tough one and practical Streaming and ETLing

Captured my learning here: https://www.cloudopsguru.com/#/DataAnalytics

Thanks to Jayendra Patil, Siddharth Mehta, Tom Carpenter for their invaluable posts and lessons (Udemy)

www.jayendrapatil.com

https://www.udemy.com/course/aws-serverless-glue-redshift-spectrum-athena-quicksight-training/

https://www.udemy.com/course/total-aws-certified-database-specialty-exam-prep-dbs-c01