Overview

This is a personal note on hosting Hugging Face models on AWS Lambda for serverless inference, based on the following article.

https://aws.amazon.com/jp/blogs/compute/hosting-hugging-face-models-on-aws-lambda/

Additionally, I cover providing an API using Lambda function URLs and CloudFront.

Hosting Hugging Face Models on AWS Lambda

Preparation

For this section, I referred to the document introduced at the beginning.

https://aws.amazon.com/jp/blogs/compute/hosting-hugging-face-models-on-aws-lambda/

First, run the following commands. I created a virtual environment called venv, but this is not strictly required.

#Clone the project to your development environment
git clone https://github.com/aws-samples/zero-administration-inference-with-aws-lambda-for-hugging-face.git
cd zero-administration-inference-with-aws-lambda-for-hugging-face

# Create and activate virtual environment
python -m venv venv
source venv/bin/activate

# Install the required dependencies:
pip install -r requirements.txt

# Bootstrap the CDK. This command provisions the initial resources needed by the CDK to perform deployments:
cdk bootstrap

Note

The documentation states that you should run cdk deploy after this. As a result, I was able to deploy and run Lambda tests. However, when I issued and used the Lambda function URL described later, several errors occurred. Therefore, the following modifications are needed.

inference/*.py

When using via the function URL, parameters are stored in queryStringParameters, so processing for this needs to be added. If using POST requests, further modifications are required.

Below is an example of changes to sentiment.py. By changing the pipeline arguments, it is also possible to perform inference based on your own custom model.

"""
Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
SPDX-License-Identifier: MIT-0
"""

import json
from transformers import pipeline

nlp = pipeline("sentiment-analysis")

def handler(event, context):
        # Added
    if "queryStringParameters" in event:
        text = event["queryStringParameters"]["text"]
    else:
        text = event["text"]

    response = {
        "statusCode": 200,
	# "body": nlp(event['text'])[0]
        "body": nlp(text)[0] # Added
    }
    return response

inference/Dockerfile

Next, add a character encoding specification to the Dockerfile.

...(omitted)

FROM huggingface/transformers-pytorch-cpu
ENV PYTHONIOENCODING="utf8" # Added

# Include global arg in this stage of the build
ARG FUNCTION_DIR
# Set working directory to function root directory
WORKDIR ${FUNCTION_DIR}

# Copy in the built dependencies
COPY --from=build-image ${FUNCTION_DIR} ${FUNCTION_DIR}

ENTRYPOINT [ "python3", "-m", "awslambdaric" ]

# This will get replaced by the proper handler by the CDK script
CMD [ "sentiment.handler" ]

Deploy

After that, run the following to deploy.

cdk deploy

Lambda Function URL

The following article is helpful as a reference.

https://dev.classmethod.jp/articles/integrate-aws-lambda-with-cloudfront/

Create a function URL from the Lambda console. This time, I simply specified “NONE” for the “Auth type” and enabled “Configure cross-origin resource sharing (CORS)”.

Additionally, I selected “*” (all) for “Allow methods”.

As a result, you can execute inference from a URL like the following.

https://XXX.lambda-url.us-east-1.on.aws/?text=i am happy

The following result is obtained.

{
  "statusCode": 200,
  "body": {
    "label": "POSITIVE",
    "score": 0.9998801946640015
  }
}

CloudFront

For this configuration as well, the following article is helpful as a reference.

https://dev.classmethod.jp/articles/integrate-aws-lambda-with-cloudfront/

The important point is to set Query strings to All in the Origin request settings. Without this, the same result will be returned even when the URL query string is changed.

As a result, you can execute inference from a URL like the following. It is also possible to assign a custom domain.

https://yyy.cloudfront.net/?text=i am happy

Summary

I introduced how to host Hugging Face models on AWS Lambda for serverless inference. There may be some inaccuracies, but I hope this serves as a useful reference.