Neural Search Starter

Introducing Mighty Starter, a complete and working neural search application kit, that you can use to quickly spin up and apply to your website or content.

It is fast, lightweight, and provides an easy to use solution for your site-search. You host it in your stack on your terms without expensive SaaS fees. For public facing informational websites and content, the licensing is completely free.

Why Neural Search?

Traditional keyword search stacks such as Elasticsearch and Solr have notoriously poor relevance out of the box, and requires experts to finick and tune the engine extensively to get decent relevance. Neural Search provides a good enough solution to have your site-search work like Google. You don't need to fuss with synonyms or tuning, and provides great relevance out of the box.

Consider the following query 'thermal blanket' using an example Outdoors StackExchange content set included with the starter kit. The best results don't contain the word 'thermal', but they are found because the meaning of the query is understood and the relevant documents surfaced. This is done without any tuning or synonyms...the search just works.

Search results for the query 'thermal blanket' using the example outdoors content set

Until now, getting started with neural search has been an enormous undertaking. It would take weeks or months to assemble various ad-hoc technologies before you have something that can demo, let alone run in production. Also, much of this tooling and knowledge is scattered around and it is difficult to know what to look for.

When building from scratch, a neural search project involves assembling and tuning at all layers of the following:

  • Content acquisition and structure
  • Base model selection and testing
  • Inference runtime and model conversion
  • Vector search engine choice and config
  • Extract-transform-load) glue
  • Search UI and API
  • Docker composition

Mighty Starter is meant to be the "set and forget" of search. This application was created as a starter kit for all of the above, and provides an adaptable Docker compose that you can quickly apply to your own needs quickly. For teams that want site-search to "just work like Google", and don't have the time or budget to worry about their search relevance, this project is for you.

You can use Mighty Starter to crawl your website's sitemap and have a complete search application up and running in minutes.

Get started now!

How it works

Mighty Starter is an assembly of curated and tuned technologies bundled into a docker compose file. Built around the speed and simplicity of Mighty Inference Server, it uses the Qdrant vector search engine as a backend for the search index, a Node.js Express application for the front end and API, and Mighty-batch for content processing. We also added some small tooling for orchestration to form the complete platform for site search.

Qdrant is a fast and lightweight vector search engine that can be used to for semantic search. To use Qdrant all you need to do is send your vector objects to the engine, and then query using another vector to get the most similar results. But where do you get these vectors? That's where Mighty Inference server comes in.

Mighty Inference Server is a ready-for-production microservice that converts unstructured text to embeddings (vectors). It's fast, lightweight, and doesn't require expensive GPUs. You can query it directly from a browser, or a backend of your choice.

Mighty Batch is used to process your site content into a format that can be indexed as vectors. Mighty-batch is built as a flexible tool that takes content from the source you specify, and gets vectors from Mighty Inference Server. When you provide a set of JSON documents, or a Sitemap, mighty-batch does the heavy lifting and processes them all for you. See the eCFR case study to learn more about how Mighty Batch works in coordination with Mighty Inference Server.

Express is the world's most popular Node.js API backend. We built some node modules to make it easy to combine Express with Mighty (https://www.npmjs.com/package/node-mighty) and Qdrant (https://www.npmjs.com/package/qdrant). The starter express app comes with a /search route, and a lightweight example webpage for the search UI.

Installation

The only thing you need are Docker and a recent version of Node.js (all the code was tested on v16)

Simply clone https://github.com/maxdotio/node-mighty-qdrant-starter and then start the servers with docker compose up.

With Mighty Starter running, you can infer and index the example outdoors content by simply running ./website.sh [name] [https://example.com/sitemap.xml]` (where '[name]' is any name you give and replace the example sitemap with your own. It takes about a half-second per document to download and index, so if you've got alot of content you'll need to be patient.

With the application running, just navigate to http://localhost:8000/ and try some queries!

Discuss this post: