Overview

These are notes on how to connect Django with AWS OpenSearch. The following article was helpful.

https://testdriven.io/blog/django-drf-elasticsearch/

However, since the above article targets Elasticsearch, changes corresponding to OpenSearch were needed.

Changes

Changes for OpenSearch were needed starting from the Elasticsearch Setup section of the article.

https://testdriven.io/blog/django-drf-elasticsearch/#elasticsearch-setup

Specifically, the following two libraries were required.

(env)$ pip install opensearch-py
(env)$ pip install django-opensearch-dsl

After that, by replacing django_elasticsearch_dsl with django-opensearch-dsl and elasticsearch_dsl with opensearchpy, I was able to proceed as described in the article.

For example, it looks like this:

# blog/documents.py

from django.contrib.auth.models import User
from django_opensearch_dsl import Document, fields # Changed to opensearch
from django_opensearch_dsl.registries import registry # Changed to opensearch

from blog.models import Category, Article


@registry.register_document
class UserDocument(Document):
    class Index:
        name = 'users'
        settings = {
            'number_of_shards': 1,
            'number_of_replicas': 0,
        }

    class Django:
        model = User
        fields = [
            'id',
            'first_name',
            'last_name',
            'username',
        ]


@registry.register_document
class CategoryDocument(Document):
    id = fields.IntegerField()

    class Index:
        name = 'categories'
        settings = {
            'number_of_shards': 1,
            'number_of_replicas': 0,
        }

    class Django:
        model = Category
        fields = [
            'name',
            'description',
        ]


@registry.register_document
class ArticleDocument(Document):
    author = fields.ObjectField(properties={
        'id': fields.IntegerField(),
        'first_name': fields.TextField(),
        'last_name': fields.TextField(),
        'username': fields.TextField(),
    })
    categories = fields.ObjectField(properties={
        'id': fields.IntegerField(),
        'name': fields.TextField(),
        'description': fields.TextField(),
    })
    type = fields.TextField(attr='type_to_string')

    class Index:
        name = 'articles'
        settings = {
            'number_of_shards': 1,
            'number_of_replicas': 0,
        }

    class Django:
        model = Article
        fields = [
            'title',
            'content',
            'created_datetime',
            'updated_datetime',
        ]

Populate Elasticsearch

The above article targeting Elasticsearch introduces the following command.

python manage.py search_index --rebuild

On the other hand, for OpenSearch, the following commands were needed.

Creating Indices

python manage.py opensearch index create
The following indices will be created:
        - users.
        - categories.
        - articles.

Continue ? [y]es [n]o : y

Creating index 'users'... OK
Creating index 'categories'... OK
Creating index 'articles'... OK

Registering Documents

python3 manage.py opensearch document index
The following documents will be indexed:
        - 5 User.
        - 3 Category.
        - 5 Article.

Continue ? [y]es [n]o : y

Indexing 5 User: OK
Indexing 3 Category: OK
Indexing 5 Article: OK

5 User successfully indexed, 0 errors:
3 Category successfully indexed, 0 errors:
5 Article successfully indexed, 0 errors:

Rebuilding Indices

python manage.py opensearch index rebuild

Additional: Adding Analyzers and Fields

Try the Field Classes described in the following section.

https://django-opensearch-dsl.readthedocs.io/en/latest/fields/#field-classes

In the following example, an html_strip analyzer and a Keyword field are set for username.

# blog/documents.py

from django.contrib.auth.models import User
from django_opensearch_dsl import Document, fields
from django_opensearch_dsl.registries import registry

from blog.models import Category, Article

from opensearchpy import analyzer, tokenizer

html_strip = analyzer(
    'html_strip',
    tokenizer="standard",
    filter=["lowercase", "stop", "snowball"],
    char_filter=["html_strip"]
)

@registry.register_document
class UserDocument(Document):

    username = fields.TextField(
        analyzer=html_strip,
        fields={'raw': fields.KeywordField()}
    )

    class Index:
        name = 'users'
        settings = {
            'number_of_shards': 1,
            'number_of_replicas': 0,
        }

    class Django:
        model = User
        fields = [
            'id',
            'first_name',
            'last_name',
            # 'username',
        ]

As a result of the above, the following mapping was registered in OpenSearch.

{
  "users" : {
    "mappings" : {
      "properties" : {
        "first_name" : {
          "type" : "text"
        },
        "id" : {
          "type" : "integer"
        },
        "last_name" : {
          "type" : "text"
        },
        "username" : {
          "type" : "text",
          "fields" : {
            "raw" : {
              "type" : "keyword"
            }
          },
          "analyzer" : "html_strip"
        }
      }
    }
  }

By using username.raw, sorting and aggregation become possible. Below is an example of views. Adding - appears to sort in descending order.

# search/views.py

import abc

from django.http import HttpResponse
from opensearchpy import Q

from rest_framework.pagination import LimitOffsetPagination
from rest_framework.views import APIView

from blog.documents import ArticleDocument, UserDocument, CategoryDocument
from blog.serializers import ArticleSerializer, UserSerializer, CategorySerializer

class PaginatedOpenSearchAPIView(APIView, LimitOffsetPagination):
    serializer_class = None
    document_class = None

    @abc.abstractmethod
    def generate_q_expression(self, query):
        """This method should be overridden
        and return a Q() expression."""

    def get(self, request, query):
        try:
            q = self.generate_q_expression(query)
            search = self.document_class.search().query(q).sort(
                "-username.raw"
            )
            response = search.execute()

            print(
                f'*** Found {response.hits.total.value} hit(s) for query: "{query}"')

            results = self.paginate_queryset(response, request, view=self)
            serializer = self.serializer_class(results, many=True)
            return self.get_paginated_response(serializer.data)
        except Exception as e:
            print(e)
            return HttpResponse(e, status=500)

Summary

I hope this serves as a useful reference for connecting Django with AWS OpenSearch.