Show HN: Nixiesearch, an open-source alternative to Elasticsearch Serverless

4 months ago 2

logo with name

CI Status  Apache 2 Last commit Last release Join our slack Visit demo

Nixiesearch is a modern search engine that runs on S3-compatible storage. We built it after dealing with the headaches of running large Elastic/OpenSearch clusters (here's the blog post full of pain), and here’s why it’s awesome:

NS design diagram

Search is never easy, but Nixiesearch has your back. It takes care of the toughest parts—like reindexing, capacity planning, and maintenance—so you can save time (and your sanity).

!!! note Want to learn more? Go straight to the quickstart and check out the live demo.

  • Nixiesearch is not a database, and was never meant to be. Nixiesearch is a search index for consumer-facing apps to find top-N most relevant documents for a query. For analytical cases consider using good old SQL with Clickhouse or Snowflake.
  • Not a tool to search for logs. Log search is about throughput, and Nixiesearch is about relevance. If you plan to use Nixiesearch as a log storage system, please don't: consider ELK or Quickwit as better alternatives.

Our elasticsearch cluster has been a pain in the ass since day one with the main fix always "just double the size of the server" to the point where our ES cluster ended up costing more than our entire AWS bill pre-ES [HN source]

When your search cluster is red again when you accidentally send a wrong JSON to a wrong REST endpoint, you can just write your own S3-based search engine like big guys do:

Nixiesearch was inspired by these search engines, but is open-source. Decoupling search and storage makes ops simpler. Making your search configuration immutable makes it even more simple.

immutable config diagram

How it's different from popular search engines?

Get the sample MSRD: Movie Search Ranking Dataset dataset:

curl -o movies.jsonl.gz https://nixiesearch.ai/data/movies.jsonl
% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 162 100 162 0 0 3636 0 --:--:-- --:--:-- --:--:-- 3681 100 32085 100 32085 0 0 226k 0 --:--:-- --:--:-- --:--:-- 226k

Create an index mapping for movies index in a file config.yml:

inference: embedding: e5-small: provider: onnx # (1) model: nixiesearch/e5-small-v2-onnx # (2) prompt: query: "query: " doc: "passage: " schema: movies: # index name fields: title: # field name type: text search: type: hybrid model: e5-small language: en # language is needed for lexical search suggest: true overview: type: text search: type: hybrid model: e5-small language: en
  1. We use ONNX Runtime for local embedding inference. But you can also use any API-based SaaS embedding provider.
  2. Any SBERT-compatible embedding model can be used, and you can convert your own

Run the Nixiesearch docker container:

docker run -itp 8080:8080 -v .:/data nixiesearch/nixiesearch:latest standalone -c /data/config.yml
a.nixiesearch.index.sync.LocalIndex$ - Local index movies opened ai.nixiesearch.index.Searcher$ - opening index movies a.n.main.subcommands.StandaloneMode$ - ███╗ ██╗██╗██╗ ██╗██╗███████╗███████╗███████╗ █████╗ ██████╗ ██████╗██╗ ██╗ a.n.main.subcommands.StandaloneMode$ - ████╗ ██║██║╚██╗██╔╝██║██╔════╝██╔════╝██╔════╝██╔══██╗██╔══██╗██╔════╝██║ ██║ a.n.main.subcommands.StandaloneMode$ - ██╔██╗ ██║██║ ╚███╔╝ ██║█████╗ ███████╗█████╗ ███████║██████╔╝██║ ███████║ a.n.main.subcommands.StandaloneMode$ - ██║╚██╗██║██║ ██╔██╗ ██║██╔══╝ ╚════██║██╔══╝ ██╔══██║██╔══██╗██║ ██╔══██║ a.n.main.subcommands.StandaloneMode$ - ██║ ╚████║██║██╔╝ ██╗██║███████╗███████║███████╗██║ ██║██║ ██║╚██████╗██║ ██║ a.n.main.subcommands.StandaloneMode$ - ╚═╝ ╚═══╝╚═╝╚═╝ ╚═╝╚═╝╚══════╝╚══════╝╚══════╝╚═╝ ╚═╝╚═╝ ╚═╝ ╚═════╝╚═╝ ╚═╝ a.n.main.subcommands.StandaloneMode$ - o.h.ember.server.EmberServerBuilder - Ember-Server service bound to address: [::]:8080

Build an index for a hybrid search:

curl -XPUT -d @movies.jsonl http://localhost:8080/movies/_index
{"result":"created","took":8256}

Send the search query:

curl -XPOST -d '{"query": {"match": {"title":"matrix"}},"fields": ["title"], "size":3}'\ http://localhost:8080/movies/_search
{ "took": 1, "hits": [ { "_id": "605", "title": "The Matrix Revolutions", "_score": 0.016666668 }, { "_id": "604", "title": "The Matrix Reloaded", "_score": 0.016393442 }, { "_id": "624860", "title": "The Matrix Resurrections", "_score": 0.016129032 } ], "aggs": {}, "ts": 1722441735886 }

You can also open http://localhost:8080/_ui in your web browser for a basic web UI:

web ui

For more details, see a complete Quickstart guide.

This project is released under the Apache 2.0 license, as specified in the License file.

Read Entire Article