kentoh - Fotolia
IT teams, data analysts and other users often need to search unstructured -- or semistructured -- data with open search strings. To address this need natively in AWS, Amazon Elasticsearch Service and Amazon Kendra are both good fits.
And while these services can perform queries across data sources, they serve different purposes. In this article, we'll provide a high-level overview of these AWS offerings and see how they differ so you can evaluate which best fits your application's needs.
Amazon Elasticsearch Service 101
Elasticsearch is an open source search and analytics platform. AWS offers a managed version of the software, Amazon Elasticsearch Service, which delivers compute capacity through Amazon EC2 instances.
Amazon Elasticsearch Service supports structured and unstructured data analysis that can be accessed through a JSON-based query language specific to Elasticsearch. The query language is flexible and can cover specific or multiple fields, operators (AND, OR, NOT, etc.), wildcards, regular expressions, ranges and grouping, among other features. Queries can be submitted through REST APIs that integrate with custom applications, or through a GUI.
Before data is analyzed, it has to be ingested into an Elasticsearch cluster. This can be done through built-in integrations with Amazon cloud services such as Amazon Kinesis Data Firehose, Amazon CloudWatch Logs and AWS IoT. Data can also be ingested through Logstash, which is part of the same open source stack as Elasticsearch and can be deployed in separate EC2 instances. Alternatively, data can be ingested through custom applications using the REST API.
Amazon Kendra basics
Amazon Kendra relies on machine learning to search data stored in multiple sources.
Developers incorporate Amazon Kendra into their applications so end users can search semistructured and unstructured data via natural language. The service has features to automatically or manually fine-tune the accuracy of search results over time.
Kendra is serverless, so application owners don't have to manage the underlying infrastructure that performs searches. However, they might have to manage data sources, depending on where data is stored.
Kendra uses an index to group a collection of documents or FAQs, which represents the data that the service will have access to. Documents can be explicitly ingested into Kendra indexes or placed in Amazon S3 for Kendra to access. The service also supports connectors to a number of external data sources, such as Salesforce, OneDrive, Confluence, ServiceNow, Google Drive, SharePoint and Amazon Relational Database Service (RDS).
Once Kendra indexes are created and the underlying documents are added, applications can query the data using the AWS SDK. AWS offers a sample project in its documentation, with code examples on how to interact with Kendra and query documents from a custom web interface.
Amazon Kendra vs. Elasticsearch Service -- comparing services
Before choosing either Kendra or Elasticsearch, application owners need to know that these services solve very different problems. While there's a small overlap when it comes to keyword searches, most other areas present significant differences.
Within an organization, Kendra is intended to improve productivity by providing employees with faster, more accurate search results about information to help them do their jobs. Externally, it can be used to track regulatory requirements to inform and enforce compliance policies. Developers can also incorporate it into customer interactions, including custom web searches and chatbots.
Amazon Elasticsearch also provides personalized search experiences for users, but it has a much broader set of capabilities. It collects, stores and analyzes log data, which can then be incorporated into efforts to monitor infrastructure, applications and security.
Both tools can be used for internal or external users, depending on the data they access and the intended user experience. It's also entirely possible for the same organization to deploy both, depending on the data accessed and the use case at hand.
Capabilities and cost
While Kendra is serverless, Amazon Elasticsearch Service requires developers to provision compute capacity and storage. Even though Elasticsearch Service is managed by AWS, it still requires explicit capacity configurations, such as instance types and size, disk storage, number of nodes, redundancy and so on.
In addition, Kendra handles natural language searches, while Amazon Elasticsearch Services requires a specific query format -- for example, web forms with explicit fields and data ranges. Amazon Elasticsearch Service offers some flexibility regarding keyword search, but it's less than what Kendra provides through its natural language capabilities. For example, it cannot complete queries such as "How much is product X?" or "Where can I find item Y?"
In terms of search capabilities, Amazon Elasticsearch Service is likely a better choice for explicit queries that involve data ranges, operators, functions, aggregations, conditions or specific value. However, it doesn't offer AI or language recognition to interpret search queries.
Kendra can explicitly ingest data into an index, and data can also be kept and accessed in its original source. Amazon Elasticsearch Service doesn't offer that option since data has to be explicitly ingested, kept and maintained in the Elasticsearch cluster itself. When it comes to data sources, Kendra requires application owners to explicitly synchronize data to ensure searches return the latest results.
Pricing could be a key factor when an IT team is discussing Amazon Elasticsearch Service vs. Kendra. Kendra is billed by the number of indexes and the volume of data accessed. As of publication, the minimum cost per index is $2.50 per hour ($1,825 per month) for the Developer Edition and $7.50 per hour for the Enterprise Edition ($5,475 per month).
On the other hand, application owners could deploy a fairly large 15-node m5.large Amazon Elasticsearch Service cluster for $.142 per hour -- less than the cost of a Kendra Developer Edition index. This means deploying a development Elasticsearch cluster could cost around $100 per month, while the most basic Kendra index would be $1,800 or more a month.