AWS Certified AI Practitioner – Tips

Howdy! I was able to crack AWS Certified AI Practitioner Certification | AWS Certification | AWS ( recently.

AWS Services Notes

1 SNo




2 1


prepare data and build, train, and deploy machine learning (ML) models

End to end Managed service

3 SageMaker Studio

single, web-based visual interface to perform all ML development steps

prepare data and build, train, and deploy model, upload data, create new notebooks, train and
tune models, move back and forth between steps to adjust experiments, compare results, and
deploy models to production

All ML development activities including notebooks,
experiment management, automatic model creation, debugging and profiling, and model drift
detection can be performed within the unified SageMaker Studio visual interface.

6 SageMaker Data Wrangler

For data preparation, transformation and feature engineering
Prep tabular and image data for
Single interface for data selection, cleansing, exploration, visualization and
Sql support and Data Quality tool

Use case – music dataset, song ratings, listening duration

10 SageMaker Canvas

No code interface
Build/tune/train model using a visual interface
Build your own custom
model using automl
Leverage data wrangler

visual drag-and-drop service that allows business analysts to build ML models and generate
accurate predictions without writing any code or requiring ML expertise.

Use case
Sentiment analysis

11 SageMaker Clarify

For data preparation.
Evaluate foundation models – compare Model A vs Model B
using human factors
Use built in datasets or bring your own dataset
Built inn metrics
and algorithms

Model Explainability – debug predictions. To increase the trust and
understanding of the model

To identify potential bias
Bring your own employee or aws employee

Detect Bias
Specify input features and bias will be automatically detected

12 SageMaker Feature Store

Store, share and manage features of ML models

13 SageMaker Ground Truth,
SageMaker Ground Truth Plus

For RLHF – reinforcement learning from human feedback
Model review
customization and evaluation

identify raw data, such as images, text files, and videos, and add informative labels to create
high-quality training datasets for your ML model

14 SageMaker Studio Notebooks

Jupyter notebooks in SageMaker for the complete ML development

15 SageMaker Studio Lab

ML development environment

that provides the compute, storage (up to 15 GB), and security

16 SageMaker HyperPod

Train models

purpose-built to accelerate foundation model (FM) training

17 SageMaker Experiments

organize and track iterations to ML models

18 SageMaker Debugger

captures real-time metrics during training

monitors CPUs, GPUs, network, and memory

19 SageMaker Serverless Inference

deploy and scale ML models

20 SageMaker Edge Manager

Optimize, secure, monitor, and maintain ML models on fleets of edge devices

smart cameras, robots, personal computers, and mobile devices

21 SageMaker Neo

After training, use Neo to compile the model

train once and run anywhere in the cloud
and at the edge

supports the most popular DL models – AlexNet, ResNet, VGG, Inception, MobileNet, SqueezeNet,
and DenseNet models trained in MXNet and TensorFlow, and classification and random cut forest
models trained in XGBoost

22 SageMaker Model Monitor

monitors the quality of Amazon SageMaker machine learning models

Monitors data/model
quality, bias drift for models, feature attribution drift for models

Get alert for
deviations. Either fix or retrain

Continuous – real-time endpoint
Continuous – batch transform job
Scheduled –
asynchronous batch transform jobs

23 SageMaker Model Registry

Centralized repository allows to track/manage and version models
Catalog models, manage
model versions, associate metadata with a model
Manage approval status of a model, automate
model deployment, share models

24 SageMaker Pipelines

Process of building training and deploy
Processing, training,
tuning, automl,model, clarifycheck, quality check

25 SageMaker Feature Store

for sharing and managing variables (features) across multiple teams during model

Ingests feature from variety of sources
Can publish directly from
sagemaker data wrangler into feature store

26 SageMaker Model Cards

Provide model documentation, not feature management.

Use case – intended uses, risk ratings and training details

28 SageMaker JumpStart

ML hub to find pretrained foundation model, computer vision models or nlp models
for quickly
deploying and consuming a foundation model (FM) within a team’s VPC.
Models can be fully
customized or access prebuilt solutions and deployed

Provides access to a wide range of pre-trained models and solutions that can be easily deployed
and consumed within a VPC.
Designed to simplify and accelerate the deployment of machine
learning models, including foundation models.

29 SageMaker Role Manager

Define role for personas

Ex: data scientist, analyst

31 2

Amazon Bedrock

32 3


generative AI–powered assistant for accelerating software development and leveraging companies’
internal data

33 Q Developer

Coding, testing, and upgrading applications, to diagnosing errors, performing security scanning
and fixes, and optimizing AWS resources

34 Q business

generative AI–powered assistant that can answer questions, provide summaries, generate content,
and securely complete tasks based on data and information in your enterprise systems

35 Q for QuickSight

unified business intelligence (BI)

multi-visual Q&A responses, get AI-driven
executive summaries of dashboards, and create detailed and customizable data stories
highlighting key insights, trends, and drivers

customers get a Generative BI assistant that allows business analysts to use natural language to
build BI dashboards in minutes and easily build visualizations and complex calculations

38 Q for Connect

real-time conversation with the customer along with relevant company content to automatically
recommend what to say or what actions an agent should take to better assist customers.

39 Q for Supply Chain

inventory managers, supply and demand planners, and others will be able to ask and get
intelligent answers about what is happening in their supply chain, why it is happening, and what
actions to take. They will also be able to explore what-if scenarios to understand the
trade-offs between different supply chain choices

41 3

Amazon Comprehend

Language, extracts key phrases,
Custom classifier — organize documents into

Analyzes text using tokenization
Supports text/pdf/word/images etc.,

Text and Documents

Ex: analyze email, create group articles that comprehend will
Use case Custom entities – analyze text for specific terms, list of entities

Sentiment analysis

42 Amazon Translate

Natural and accurate translate languages

Custom terminology – csv/tsv/tmx

Text and Documents

Use cases – websites and applications, for international users
Html/text documents from S3

43 Amazon Textract

Extract text. Handwriting and data from any scanned documents using AI/ML

Ex: scan a
image and read the text

Text and Documents

Use cases – financial services. Health care, public sector (health
forms etc.,_

45 4

Amazon Rekognition

Find objects, people, text, scenes in images and videos

Custom labels – identify/find
your own pics/logos. Ex: NFL

Content moderation – detect inappropriate, unwanted,
offensive content

Custom Moderation Adapters – extend rek capabilities by providing your
own labeled set of images


Use cases – labeling, content moderation, text detection, face detection and
analysis (gender)
Celebrity recognition

Filter out harmful images

47 5

Amazon Kendra

Document search service
Extract answers from docs – text/pdf/html/ppt/word etc.,
language search capabilities
Creates knowledge index/powered by ML internally


49 6

Amazon Lex

Using voice and text
Conversational ai with multiple languages
Integrates with lambda,
Connect, comprehend, kendra


51 7

Amazon Polly

Convert text to speech

Lexicons –
– define how to read certain pieces of text
ex: AWS => Amazon Web Services

– Speech synthesis markup language

markup how the text should be pronounced

Voice engine
– generative, neural

Speech mark
– ex: lip syncing or highlight word as they are spoken
encode where a sentence/word starts or ends in an audio


Long form

52 Amazon Transcribe

Convert speech to text
Deep learning process called automatic speech recognition
PII using redaction
Supports automatic language identification for multi lingual

Custom Vocabularies – Can capture domain specific/non-standard terms
hints to increase recognition

Custom language models (for context) – for domain specific


Use cases –
customer service calls, automate closed captioning/subtitling,
generate meta data for media assets to create a fully searchable archive

Can transcribe
multiple languages at the same time

54 8

Amazon Personalize

Ex: retail stores, media and entertainment


56 9

AWS DeepRacer

Console to train and evaluate deep RL

58 10

Amazon Forecast

ML to deliver highly accurate forecasts

Use case – predict future sales
Product demand planning, financial planning, resource

61 11

Amazon Mechanical Turk

Crowdsourcing marketplace
Distributed virtual workforce
Integrates with Amazon A2I,
SageMaker Ground Truth etc.,

Use case – label 1000000 images
Data collection, business processing etc.,

63 12

Amazon Augmented AI

Human oversight of machine learning predictions in production

Can be own employees or AWS/contractors

65 13

Amazon Comprehend Medical and Transcribe

67 14

Amazon’s Hardware for AI

AWS Trainium – Trn1 instance

AWS Inferentia – ML chip built to deliver inference
throughput, 70% cost reduction

EC2 user data/firewall

EC2 GPU – P3, P4, P5,…. G3,.. G6


Machine Learning Notes

1 SNo


Used for



Use Cases

2 1

Supervised Learning

Linear Regression

Model relationship  between one or more input features
One output 
variable — target

Historical sales data, output – no of units to be produced
Predict House prices, stocks
prices, sales volume etc.,

3 2


Binary classification

Binary outcome yes/no, true/false, +/-

4 3


Time series prediction

forecasts future values based on past and present data

5 4



estimates a continuous numerical value based on the input features

6 5


recurrent neural network (RNN)

type of neural network that can process sequential data. suited for predicting future events
based on past observations
NOTE: CNN is for images and RNN is for timeseries.

forecasting engine failures based on sensor readings

TensorFlow, PyTorch, Keras, MXNet

7 convolutional neural network (CNN)

Classify an object amongst a group
NOTE:  CNN is for images and RNN is for

an animal image as input and identify probability distribution of how likely amongst 10
types of animals

Softmax function transforms a a arbitrary real values  into a range of
TensorFlow, PyTorch, Keras, MXNet

8 WaveNet

Generative model for raw audio
WaveNet is a deep autoregressive CNN with
stacked layers of dilated convolution, used for generating speech. To deliver a more human-like
WaveNet: • 𝗠𝗼𝗱𝗲𝗹𝘀 𝘁𝗵𝗲 𝗿𝗮𝘄 𝘄𝗮𝘃𝗲𝗳𝗼𝗿𝗺 𝗼𝗳 𝗮𝘂𝗱𝗶𝗼
𝘀𝗶𝗴𝗻𝗮𝗹𝘀, making the voice sound more natural and expressive
In WaveNet, the
CNN takes a raw signal as an input and synthesises an output one sample at a time

9 classification

KNN (K nearest neighbor)

finding the k most similar instances in the training data to a given query instance, and
then predicting the output based on the average or majority of the outputs of the k nearest
handle time series data

Ex: air quality data and predict for next 2 days based on last 2 year
Identify if imge has a logo amongst a larger group

can perform both classification and regression tasks

10 Latent Dirichlet Allocation (LDA)

suitable for topic modeling tasks (in NLP)
discover the hidden topics and their
proportions in a collection of text documents,

 news articles, tweets, reviews, etc

Gensim, Scikitlearn, Mallet
Not valid for images

11 Factorization Machines (FM) Algorithm

used for tasks dealing with high dimensional sparse datasets

12 Unsupervised

Topic Modeling

Topic modeling is a type of statistical modeling that uses unsupervised Machine Learning to
identify clusters or groups of similar words within a body of text

13 BERT based models

Google developed BERT to serve as a bidirectional transformer model that examines words
within text by considering both left-to-right and right-to-left contexts

Missing words in  text

17 Unsupervised

Principal component analysis (PCA)

reduce the dimensionality (number of features) within a dataset while still retaining as
information as possible
Used when the features are highly correlated with
each other

Using finding a new set of features called components

18 6


Random Cut Forest (RCF)

assigns an anomaly score to each data point based on how different it is from the rest of
the data

Ex: realtime ingestion, identify anamoly/malicious events

19 7


Anomaly detection

identifies outliers or abnormal patterns in the data

20 8


K-means clustering

randomly assigning data points to a number of clusters, then iteratively updating the
cluster centers and reassigning the data points until the clusters are stable

result is a partition of the data into distinct and homogeneous groups

exploratory data analysis, data compression, anomaly detection, and feature extraction

21 RMSE Root mean square error

Goal – to predict a continuous value
measures the average difference between
the predicted and the actual values

Price of a house, temperature of a city

Good for regression
NOT good fo classification

22 regression

MAPE Mean absolute percentage error

Used for regression

23 ROC receiver operating characteristic (ROC) curve

used to understand how different classification thresholds will impact the models
A ROC curve can show the trade-off between the True positive rate TPR
and the FPR for different thresholds

predict whether or not a person will order a pizza

24 9

Classification (binary)

Area Under ROC Curve (AOC)

Compare/evaluate ML models
AUC is calculated based on the Receiver
Characteristic (ROC) curve, which is a plot that shows the trade-off between the
positive rate (TPR) and the false positive rate (FPR) of the classifier as the decision
threshold is
varied. The TPR, also known as recall or sensitivity

Credit card transactions – identify 99k valid vs 1k fraudulent

25 Residual plots

used to understand whether a
regression model is more frequently overestimating or
underestimating the target

26 Confusion matrix

table that shows the counts of true positives, false positives, true negatives, and false
negatives for each class
indicate the accuracy, precision, recall, and F1-score of
the model for each class,

only applicable for classification models, not regression models. A confusion matrix cannot
show the magnitude or direction of the errors made by the model.

27 Precision

proportion of predicted positive cases that are actually positive. Precision is a useful
metric when the cost of a false positive is high
Recall is not a good metric for imbalanced
classification problems

fraudulent transactions
spam detection or medical diagnosis

28 Classification


Same as TPR (true positive rate)
Recall is a useful metric when the cost of a false
negative is high
Recall = True Positives / (True Positives + False Negatives)

fraud detection or cancer diagnosis

29 Supervised

Classification (multi-class)


Can handle multiple features and multiple classes

Categorize new products when a dataset/features is provided
Ex: with 15 features
(title/weight/price) categorize books/games/movies etc from a dataset of 1200
Credit card fraud detection (ex: with a large dataset of
historical data, find/predict new txns)

can be used for classification, regression, ranking, and other tasks. It is based on the
gradient boosting algorithm, which builds an ensemble of weak learners (usually decision trees)
to produce a strong learner

30 classification

Term frequency-inverse document frequency (TF-IDF)

assigns a weight to each word in a document based on how important it is to the meaning of
the document
NOTE: The term frequency (TF) measures how often a word appears in a
document, while the inverse document frequency (IDF) measures how rare a word is across a
collection of documents.




Classification/ categorize


technique that can learn distributed representations of words, also known as word
embeddings, from large amounts of text data

when tuning parameters doesn’t help a lot. Transfer learning would be better solution





Collaborative Filtering

recommends products or services to users based on the ratings or preferences of other users

customer shopping patterns and preferences based on demographics, past visits, and locality





Decision tree

perform classification tasks by splitting
the data into smaller and purer subsets based
on a series of rules or conditions

binary classifier based on two features: age of account and transaction month

both linear and non-linear data, and can capture complex patterns and interactions
the features









Preprocessing technique


Data normalization

Scale the feature to a common range (0,1) or (-1,)


min-max scaling, z-score standardization, or unit vector


Preprocessing technique


Dimensionality reduction

Reduce number of features




Preprocessing technique


Model regularization

adds a penalty term to the cost function to prevent overfitting






L1/L2 regularization

Overfitting problem can be addressed by applying regularization techniques such as L1 or L2
regularization and dropouts.
Regularization techniques add a penalty term to the
cost function of the model, which helps to reduce the complexity of the model and prevent it
from overfitting to the training data. Dropouts randomly turn off some of the neurons during
training, which also helps to prevent overfitting




Preprocessing technique


Data augmentation

increases the amount of data by creating synthetic






Poisson distribution

suitable for modeling the number of events that occur in a fixed interval of time or space,
given a known average rate of occurrence

waiting for a bus, the interval is 10 minutes, and the average rate is 3 minutes





Normal distribution







Binomial distribution







Uniform distribution











45 10







Other Notes

Data preprocessing – is the process of generating raw data for machine learning models
Feature engineering – refers to manipulation — addition, deletion, combination, mutation — of your data set to
improve machine learning model training, leading to better performance and greater accuracy.
Exploratory data analysis (EDA) – is used by data scientists to analyze and investigate data sets and summarize
their main characteristics, often employing data visualization methods.
Hyperparameter tuning – is the process of selecting the optimal values for a machine learning model’s
Transfer learningis a strategy for adapting pre-trained models for new, related tasks without creating models
from scratch.

Epochs – helps to improve accuracy
– Increasing the number of epochs during model training allows the model to learn from the data over more
iterations, potentially improving its accuracy up to a certain point. This is a common practice when attempting
to reach a specific level of accuracy. Increasing epochs allows the model to learn more from the data, which can
lead to higher accuracy.
– Decreasing the epochs would reduce the training time, possibly preventing the model from reaching the desired

Batch Size – Affects training speed
– Decrease the batch size affects training speed and may lead to overfitting but does not directly relate to
achieving a specific accuracy level.

Temperature – Affects randomness of predictions
– Increase the temperature parameter affects the randomness of predictions, not model accuracy.
– Decrease the temperature to produce more consistent responses to the same input prompts

Leave a Reply

Your email address will not be published. Required fields are marked *