Building a search engine

So I built a search engine. In two and a half weeks. During Christmas.

Yes, I have a life. Thanks for asking!

Following a course on Brilliant, I decided to code my own full text search engine. How hard can it be? 3K lines of code in (for just the backend), I can confirm, it’s educational, but a pain in the ass.

Contents
1 Building a search engine
1.1 The basics
1.2 Goals
1.3 Platform
1.4 Efficient logic
1.5 Data structures - binary trees
1.6 Ignoring certain letters
1.7 SPEEED
1.8 Unexpected caveats
1.9 Index
1.9.1 Typo tolerance
1.9.2 Ignoring some of the words
1.10 Ratings
1.11 Kvarn integration

The basics

The thing in common between search engines is their index. That’s how they quickly know where to look for the searched content.

They achieve this through storing map of words and the documents they are in (and sometimes also exactly where each occurrence is). Now, we can quickly (by indexing the map) query where the word we’re looking for is. Then, read the data and return the hit with context.

I’ve expanded these principles a bit to provide a nicer searching experience, what you’ve grown used to in DuckDuckGo and Google.

Goals

As any good project manager, I set up some goals before starting the theory work and coding.

Prioritize memory/disk use over performance. This implies searching through each of the matching files, for every query.
Typo tolerance. You should be able to get results if you search for the start of a word or if you missed some characters.
Fast enough for updating the results for each key press. The server shouldn’t be the b