Posts

Building an MCP Server for OpenDota API with FastMCP

Introduction You’ve probably heard about the Model Context Protocol (MCP) by now — an emerging standard for connecting AI agents to external data sources and tools. I wanted to explore building a serverless HTTP-based MCP server using the typical APIGW + Lambda pattern that is common in serverless applications, i.e. the way some MCP server provider might want to host MCP servers for public consumption. When thinking about which tools the MCP server should have, I wondered how hard it would be to convert an existing REST API into an MCP server - especially given some standard specifications like OpenAPI. ...

Lambda Functions with Rust and CDK

Introduction This post is a follow-up to the previous post on Lambda Functions with Go and CDK. In this post, we’ll go through deploying a Rust-based AWS Lambda function using the AWS Cloud Development Kit (CDK). We’ll focus on the developer workflow: how to get started, wire up your Rust Lambda, and use CDK to manage your infrastructure as code. If you’re already familiar with Lambda as a service, this guide will help you get productive with Rust and CDK quickly. ...

Lambda Functions with Go and CDK

Introduction AWS Lambda is a serverless computing service that lets you run code without provisioning or managing servers. Lambdas are very flexible because you can run them on a schedule, in response to events, or even as an API endpoint that responds to an HTTP request. In this post, we’ll look at how to create a Lambda function using Go and the AWS Cloud Development Kit (CDK) - with a focus on using CDK to speed up the development process and improve operational efficiency. ...

Sequence Models: Week 4 | Transformers

This is the fourth and last week of the fifth course of DeepLearning.AI’s Deep Learning Specialization offered on Coursera. The main topic for this week is transformers, a generalization of the attention model that has taken the deep learning world by storm since its inception in 2017. This week’s topics are: Transformer Network Intuition Self-Attention Multi-Head Attention Transformer Network Architecture More Information Transformer Network Intuition We started with RNNs (known as part of the prehistoric era now), a simple model that reutilizes the same weights at each time steps; allowing to combine previous step’s hidden states with the current one. To solve some issues with vanilla RNNs, we introduced GRUs and LSTMs; both more flexible and more complex than simple RNNs. However, one of the things that they all share in common is that the input must be processed sequentially, i.e. one token at a time. This is a problem with large models, where we want to parallelize computation as much as possible. Amdahl’s Law gives us a theoretical speed up limit based on the fraction of parallelizable compute in a computer program. Unfortunately, since the entire model is sequential the speed-ups are miniscule. The transformer architecture allows us to process the entire input at once, and in parallel; allowing us to train much more complex models which in turn generate richer feature representations of our sequences. ...

Sequence Models: Week 3 | Sequence Models & Attention Mechanism

This is the third week of the fifth course of DeepLearning.AI’s Deep Learning Specialization offered on Coursera. This week goes over sequence-to-sequence models using beam search to optimize the classification step. We also go over the important concept of attention which generalizes a couple of things seen in the last week. This week’s topics are: Sequence to Sequence Architectures Basic Seq2Seq Models Picking the Most Likely Sentence Why not Greedy Search? Beam Search Refinements Error Analysis Attention Developing Intuition Defining the Attention Model Sequence to Sequence Architectures The basic example for sequence-to-sequence approaches was also covered in the first week of the course; where we discussed the many-to-many RNN approach where $T_x \neq T_y$. This encoder-decoder approach is what we will start discussing in the context of machine translation, a sequence-to-sequence application example. ...

Sequence Models: Week 2 | NLP & Word Embeddings

This is the second week of the fifth course of DeepLearning.AI’s Deep Learning Specialization offered on Coursera. In this week, we go a little more in depth into natural language applications with sequence models, and also discuss word embeddings—an amazing technique for extracting semantic meaning from words. This week’s topics are: Introduction to Word Embeddings Word Representation Using Word Embeddings Properties of Word Embeddings Cosine Similarity Embedding Matrix Word Embeddings Learning Word Embeddings Word2Vec Negative Sampling GloVe Word Vectors Applications Using Word Embeddings Sentiment Classification De-biasing Word Embeddings Introduction to Word Embeddings Word Representation Word embeddings are a way of representing words. The approach borrows from dimensionality reduction, and combines it with optimization. These two things allow us to create new word representations that are empirically good with respect to some task. Let’s go over how this is possible. ...

Sequence Models: Week 1 | Recurrent Neural Networks

This is the first week of the fifth course of DeepLearning.AI’s Deep Learning Specialization offered on Coursera. This week we go over some motivation for sequence models. These are models that are designed to work with sequential data, otherwise known as time-series. Let’s get started. This week’s topics are: Why Sequence Models? Notation Representing Words Recurrent Neural Network Forward Propagation Different Types of RNNs Language Model and Sequence Generation Vanishing Gradients with RNNs Gated Recurrent Unit Long Short-Term Memory Bidirectional RNN Why Sequence Models? Time-series get to be their own thing, just like in regression analysis. This time, since we are focusing on prediction instead of inference, we are less concerned about the statistical properties of the parameters we estimate, but we’d like our models to do very well in their prediction tasks. But how can we exploit temporal information, without using classical methods such as AR methods? The current bag of tricks we have developed so far will only take us some distance. Here are a couple of hiccups: ...

Convolutional Neural Networks: Week 4 | Face Recognition & Neural Style Transfer

This is the fourth and last week of the fourth course of DeepLearning.AI’s Deep Learning Specialization offered on Coursera. In this week, we go over special applications in the field of computer vision with CNNs: face recognition and neural style transfer. This week introduces new important concepts that will be useful even beyond the context of CNNs. This week’s topics are as follows: Face Recognition What is Face Recognition? One Shot Learning Siamese Network Triplet Loss Face Verification and Binary Classification Neural Style Transfer What is Neural Style Transfer? What are deep CNNs learning? Cost Function Content Cost Function Style Cost Function Face Recognition What is Face Recognition? Let’s start by going over the important distinction between face verification and face recognition. ...

Convolutional Neural Networks: Week 3 | Detection Algorithms

This is the third week of the fourth course of DeepLearning.AI’s Deep Learning Specialization offered on Coursera. The week focuses on object detection and localization, important applications of computer vision where CNNs serve as a building block to more specialized applications. This week’s topics are: Object Localization Landmark Detection Object Detection Sliding Windows Detection Convolutional Implementation of Sliding Windows Turning fully connected layers into convolutional layers Implementing sliding windows convolutionally Bounding Box Predictions Intersection Over Union Non-max Suppression Anchor Boxes Semantic Segmentation Transpose Convolutions U-Net Architecture Object Localization Object localization is, intuitively, not just detecting an object in an image, but also being able to describe its position in the image. We previously discussed how we can train a classifier using images with CNNs. The new twist is the localization of the object in the image. ...

Convolutional Neural Networks: Week 2 | Case Studies

This is the second week of the fourth course of DeepLearning.AI’s Deep Learning Specialization offered on Coursera. This week is largely a literature review, going over different architectures and approaches that have made large contributions to the field. This week’s topics are: Case Studies Classic Networks LeNet5 AlexNet VGG16 ResNets Why do ResNets Work? Networks in Networks | 1x1 Convolutions Inception Network Inception Network Architecture MobileNet Depth-wise Convolution Point-wise Convolution MobileNet Architecture Practical Advice for Using ConvNets Transfer Learning Data Augmentation Case Studies We should obviously keep up with the computer vision literature if we are interested in implementing new ideas. However, since CNNs have so many hyperparameters and settings, it’s essential to pay attention to the empirically justified advances occurring in the field. Since so many computer vision tasks are similar, many of the core ideas of new approaches can be applied, sometimes identically and sometimes with minor editions, to new applications. Finally, in the age of big data and cheap compute, we can get away with virtually free-riding on somebody else’s compute by using their pre-trained model. Let’s start with the “classic” networks that hit the field from the late 90s through the early 10s. ...

Convolutional Neural Networks: Week 1 | Foundations of CNNs

This is the first week of the fourth course of DeepLearning.AI’s Deep Learning Specialization offered on Coursera. This course introduces convolutional neural networks, an extremely popular architecture in the field of computer vision. This week’s topics are: Computer Vision Convolution Convolution in continuous land Convolution in discrete land Back to Edge Detection Learning the Filters Padding Strided Convolutions Convolutions Over Volume One Layer of a CNN Defining the Notation and Dimensions Simple CNN Example Pooling Layers Full CNN Example Why Convolutions? Computer Vision If you can think of any computer vision application today: self-driving cars, medical imaging, face recognition and even visual generative AI; it’s very likely that they’re using some kind of convolutional architecture. Computer vision is a field of computer science that focuses on enabling computers to identify and understand objects and people in images and videos 1. Identification and understanding are nebulous words, but the key thing is that computer vision involves processing digital images and videos. Let’s think about how we could represent an image and use our existing knowledge about neural networks to design a cat classifier. ...

Structuring ML Projects: Week 2 | ML Strategy

This is the second week of the third course of DeepLearning.AI’s Deep Learning Specialization offered on Coursera. This course is less technical than the previous two and focuses instead on general principles and intuition related to machine learning projects. This week’s topics are: Error Analysis Carrying out Error Analysis Cleaning up Incorrectly Labeled Data Build our First System Quickly, then Iterate Mismatched Training and Dev/Test Sets Training and Testing on Different Distributions Bias and Variance with Mismatched Data Distributions Addressing Data Mismatch Learning from Multiple Tasks Transfer Learning Multitask Learning End-to-end Deep Learning What is End-to-end Deep Learning? Whether to use End-to-end Deep Learning Error Analysis Carrying out Error Analysis One of the things that we can do when our model is performing worse than human-level performance is to carry out error analysis. Error analysis is just a fancy name to trying to ascertain what the sources of errors are. This is critical because if we can quickly come up with a “ceiling” or upper-bound on the improvement of a particular strategy. ...

Structuring ML Projects: Week 1 | ML Strategy

This is the first week of the third course of DeepLearning.AI’s Deep Learning Specialization offered on Coursera. This course is less technical than the previous two and focuses instead on general principles and intuition related to machine learning projects. This week’s topics are: Introduction to ML Strategy Why ML Strategy Orthogonalization Setting Up our Goal Single Number Evaluation Metric Satisficing and Optimizing Metrics Train/Dev/Test Distributions Size of Dev and Test Sets When to Change Dev/Test Sets and Metrics? Comparing to Human-Level Performance Why Human-level performance? Avoidable Bias Understanding Human-level Performance Surpassing Human-level Performance Improving your Model Performance Introduction to ML Strategy Why ML Strategy Whenever we are working on a machine learning project, and after we have completed our first iteration on the approach, there might be many things to try next. Should we get more data? Try regularization? Try a bigger network? So many things to choose from. This is why we need high-level heuristics to guide our strategy. ...

Improving Deep Learning Networks: Week 3 | Hyperparameter Tuning, Batch Optimization, Programming Frameworks

This is the third and final week of the second course of DeepLearning.AI’s Deep Learning Specialization offered on Coursera. This week’s topics are: Hyperparameter Tuning Tuning Process Random Search Coarse-to-fine Grained Search Using an Appropriate Scale when Searching Python Implementation Hyperparameter Tuning in Practice: Pandas vs. Caviar Batch Normalization Normalizing Activations in a Network Fitting Batch Norm into a Neural Network Why does Batch Norm work? Batch Norm at Test Time Multi-class Classification Softmax Regression Training a Softmax Classifier Programming Frameworks Hyperparameter Tuning We have seen by now that neural networks have a lot of hyperparameters. Remember that hyperparameters remain fixed during training. This means that the process of finding reasonable hyperparameters, called hyperparameter tuning, is a process that is separate from training your model. ...

Improving Deep Learning Networks: Week 2 | Optimization Algorithms

This week is focused on the optimization and training process. In particular, this week covers ways to make the training process faster and more efficient, allowing us to iterate more quickly when trying different approaches. This week’s topics are: Mini-batch Gradient Descent Exponentially Weighted Moving Averages (EWMA) Bias Correction Gradient Descent with Momentum RMSProp Adam Learning Rate Decay Mini-batch Gradient Descent We discussed gradient descent briefly on the second week of the first course. A key takeaway is that vanilla gradient descent is vectorized across all our training samples $x^{(m)}$. That is, every gradient step is taken after doing a forward and back propagation over our entire training data. This is relatively efficient when our data is small, but as our data grows it becomes a very big challenge. This is especially true with large neural networks or complex architectures. ...

Improving Deep Learning Networks: Week 1 | Practical Aspects of Deep Learning

This is the first week in the second course of DeepLearning.AI’s Deep Learning Specialization offered on Coursera. The course deals with hyperparameter tuning, regularization and the optimization of the training process. The optimization ranges from computational complexity to performance. This is again a pretty technical week, where you will benefit a lot from doing the programming exercises. This week’s topics are: Setting up our Machine Learning problem Train / Dev / Test sets Bias-Variance Tradeoff Basic Recipe for Machine Learning Regularizing our Neural Network Regularization Why Does Regularization Reduce Overfitting? Dropout Regularization Understanding Dropout Other Regularization Methods Setting up our Optimization Problem Normalizing Inputs Vanishing and Exploding Gradients Weight Initialization Setting up our Machine Learning problem Train / Dev / Test sets Machine learning projects are highly iterative. That is, you try something new, see how it does and then adjust; very much like gradient descent. Therefore, you want the iteration time to be quick so that you can try as many things as quickly as possible, without affecting the final performance of the model. Part of this is setting up your datasets correctly so that you can efficiently iterate over different approaches. ...

Neural Networks and Deep Learning: Week 4 | Deep Neural Networks

Final week of this course. Again, this week is pretty technical and a lot of the learning is done while coding up your own examples via the weekly programming assignments. The purpose of this week is to extend previous weeks’ ideas into $L$-layered networks. This week’s topics are: Deep L-Layer neural network Getting your matrix dimensions right Why deep representations? Parameters and Hyperparameters Deep L-Layer neural network The number of hidden layers in a neural network determine whether it is “shallow” or “deep”. Exactly how many layers is deep or shallow is not set in stone. ...

Neural Networks and Deep Learning: Week 3 | Shallow Neural Networks

This week’s focus is again very technical. Similar to the previous week, the focus is on the implementation of neural networks and on how to generalize your code from a single-layer network to a multi-layer network, and also how to extend the ideas presented previously into deeper neural networks. This week’s topics are: Overview Neural Network Representation Computing a Neural Network’s Output Vectorizing across multiple examples Activation functions Random Initialization Overview It’s time to refine our notation and to disambiguate some concepts introduced in week 2. Let’s start with the notation used in the course. ...

Neural Networks and Deep Learning: Week 2 | Neural Network Basics

Here we kick off the second week of the first course in the specialization. This week is very technical, and many of the details shown in the course will be lost in the summarization. Also, a lot of the content is based on you doing programming assignments. There is simply no substitute for getting your hands dirty. This week’s topics are: Binary Classification Logistic Regression Logistic Function Gradient Descent Computation Graph Python and Vectorization Broadcasting Binary Classification Binary classification is a supervised learning approach where you train what’s called a classifier. The binary classifier is a model that learns how to discriminate between two classes from the features, think about cats and dogs. A key concept is that of linearly separability: ...

Neural Networks and Deep Learning: Week 1 | Introduction to Deep Learning

Introduction to Deep Learning This is the first course in Coursera’s Deep Learning Specialization. I will try to summarize the major topics presented in each week of each course in a series of posts. The purpose of this is to both deepen my own understanding by explaining, but also to help people who have not taken the specialization. Hopefully, these posts will inspire you to do so. This week’s topics are: ...