Overview
These are notes on how to connect Django with AWS OpenSearch. The following article was helpful.
https://testdriven.io/blog/django-drf-elasticsearch/
However, since the above article targets Elasticsearch, changes corresponding to OpenSearch were needed.
Changes
Changes for OpenSearch were needed starting from the Elasticsearch Setup section of the article.
https://testdriven.io/blog/django-drf-elasticsearch/#elasticsearch-setup
Specifically, the following two libraries were required.
(env)$ pip install opensearch-py
(env)$ pip install django-opensearch-dsl
After that, by replacing django_elasticsearch_dsl with django-opensearch-dsl and elasticsearch_dsl with opensearchpy, I was able to proceed as described in the article.
For example, it looks like this:
# blog/documents.py
from django.contrib.auth.models import User
from django_opensearch_dsl import Document, fields # Changed to opensearch
from django_opensearch_dsl.registries import registry # Changed to opensearch
from blog.models import Category, Article
@registry.register_document
class UserDocument(Document):
class Index:
name = 'users'
settings = {
'number_of_shards': 1,
'number_of_replicas': 0,
}
class Django:
model = User
fields = [
'id',
'first_name',
'last_name',
'username',
]
@registry.register_document
class CategoryDocument(Document):
id = fields.IntegerField()
class Index:
name = 'categories'
settings = {
'number_of_shards': 1,
'number_of_replicas': 0,
}
class Django:
model = Category
fields = [
'name',
'description',
]
@registry.register_document
class ArticleDocument(Document):
author = fields.ObjectField(properties={
'id': fields.IntegerField(),
'first_name': fields.TextField(),
'last_name': fields.TextField(),
'username': fields.TextField(),
})
categories = fields.ObjectField(properties={
'id': fields.IntegerField(),
'name': fields.TextField(),
'description': fields.TextField(),
})
type = fields.TextField(attr='type_to_string')
class Index:
name = 'articles'
settings = {
'number_of_shards': 1,
'number_of_replicas': 0,
}
class Django:
model = Article
fields = [
'title',
'content',
'created_datetime',
'updated_datetime',
]
Populate Elasticsearch
The above article targeting Elasticsearch introduces the following command.
python manage.py search_index --rebuild
On the other hand, for OpenSearch, the following commands were needed.
Creating Indices
python manage.py opensearch index create
The following indices will be created:
- users.
- categories.
- articles.
Continue ? [y]es [n]o : y
Creating index 'users'... OK
Creating index 'categories'... OK
Creating index 'articles'... OK
Registering Documents
python3 manage.py opensearch document index
The following documents will be indexed:
- 5 User.
- 3 Category.
- 5 Article.
Continue ? [y]es [n]o : y
Indexing 5 User: OK
Indexing 3 Category: OK
Indexing 5 Article: OK
5 User successfully indexed, 0 errors:
3 Category successfully indexed, 0 errors:
5 Article successfully indexed, 0 errors:
Rebuilding Indices
python manage.py opensearch index rebuild
Additional: Adding Analyzers and Fields
Try the Field Classes described in the following section.
https://django-opensearch-dsl.readthedocs.io/en/latest/fields/#field-classes
In the following example, an html_strip analyzer and a Keyword field are set for username.
# blog/documents.py
from django.contrib.auth.models import User
from django_opensearch_dsl import Document, fields
from django_opensearch_dsl.registries import registry
from blog.models import Category, Article
from opensearchpy import analyzer, tokenizer
html_strip = analyzer(
'html_strip',
tokenizer="standard",
filter=["lowercase", "stop", "snowball"],
char_filter=["html_strip"]
)
@registry.register_document
class UserDocument(Document):
username = fields.TextField(
analyzer=html_strip,
fields={'raw': fields.KeywordField()}
)
class Index:
name = 'users'
settings = {
'number_of_shards': 1,
'number_of_replicas': 0,
}
class Django:
model = User
fields = [
'id',
'first_name',
'last_name',
# 'username',
]
As a result of the above, the following mapping was registered in OpenSearch.
{
"users" : {
"mappings" : {
"properties" : {
"first_name" : {
"type" : "text"
},
"id" : {
"type" : "integer"
},
"last_name" : {
"type" : "text"
},
"username" : {
"type" : "text",
"fields" : {
"raw" : {
"type" : "keyword"
}
},
"analyzer" : "html_strip"
}
}
}
}
By using username.raw, sorting and aggregation become possible. Below is an example of views. Adding - appears to sort in descending order.
# search/views.py
import abc
from django.http import HttpResponse
from opensearchpy import Q
from rest_framework.pagination import LimitOffsetPagination
from rest_framework.views import APIView
from blog.documents import ArticleDocument, UserDocument, CategoryDocument
from blog.serializers import ArticleSerializer, UserSerializer, CategorySerializer
class PaginatedOpenSearchAPIView(APIView, LimitOffsetPagination):
serializer_class = None
document_class = None
@abc.abstractmethod
def generate_q_expression(self, query):
"""This method should be overridden
and return a Q() expression."""
def get(self, request, query):
try:
q = self.generate_q_expression(query)
search = self.document_class.search().query(q).sort(
"-username.raw"
)
response = search.execute()
print(
f'*** Found {response.hits.total.value} hit(s) for query: "{query}"')
results = self.paginate_queryset(response, request, view=self)
serializer = self.serializer_class(results, many=True)
return self.get_paginated_response(serializer.data)
except Exception as e:
print(e)
return HttpResponse(e, status=500)
Summary
I hope this serves as a useful reference for connecting Django with AWS OpenSearch.