PyGenSearch

Powerful Python Search Engine SDK

Search is not just a feature – it’s the foundation of modern digital experiences.
PyGenSearch was born to save you time: writing a production-ready search engine from scratch is hard and distracting. Drop PyGenSearch into your app and get a ready-to-go, blazing-fast search engine in minutes.

Why PyGenSearch?

Pain Without PyGenSearch

Gain With PyGenSearch

Weeks building bespoke indexing

Minutes to production-ready search

Complex scoring & ranking logic

Sensible defaults, easy overrides

Scaling headaches when data grows

Memory-efficient index & async I/O

Boilerplate for every framework

First-class adapters for Flask, FastAPI, Django & more

Limited dev hours diverted from core product

Focus on your product, not search internals

PyGenSearch fits perfectly when building:

  • E-commerce (product discovery & faceted filtering)

  • Content management systems (blog, docs, news)

  • Internal tools & dashboards

  • Knowledge bases / support portals

  • Social & community platforms


Key Features

Core Capabilities

Feature

Details

Lightning-Fast Search

Optimized in-memory inverted index built with Cython; sub-millisecond look-ups on ~50k docs.

Smart Matching

Fuzzy matching, typo-tolerance, n-gram & prefix queries, configurable similarity thresholds.

Relevance Ranking

TF-IDF & BM25 scoring out-of-the-box, custom weighting per field, boost hooks.

Live Indexing

Add / update / delete documents at runtime without blocking searches.

Developer Experience

Typed API, rich docstrings, Pydantic models, async/await variants, exhaustive examples.

Quick Start

Installation

pip install pygen-search

Basic Usage

from pygen_search import PyGenSearch

documents = [
    {
        "id": 1,
        "title": "Getting Started with Python",
        "content": "Python is a versatile programming language loved by millions...",
        "tags": ["programming", "python", "beginners"],
        "category": "education",
        "date_published": "2024-09-10"
    },
    # ... more docs ...
]

engine = PyGenSearch(
    data=documents,
    searchable_fields=["title", "content", "tags"]
)

print(engine.search("python programming"))

Core Concepts

Search Architecture

  1. Indexing Layer

    • Text preprocessing (lower-casing, stop-word removal, stemming/lemmatization).

    • Token extraction ➜ n-grams + prefixes for fuzzy matching.

    • Inverted index stored in compressed postings lists.

    • Configurable per-field boosts and weights.

  2. Query Processing

    • Query parsing (phrase, boolean, wildcard).

    • Candidate set retrieval (skip-lists for fast seeking).

    • Scoring (BM25 by default) ➜ boost hooks ➜ post-filtering.

    • Faceting & pagination.

Data Flow

Integration Guides

Flask API Example

from flask import Flask, request, jsonify
from pygen_search import PyGenSearch

app = Flask(__name__)

data = [
    {"id": 1, "title": "Learn AI Today", "desc": "Machine Learning is fun.", "category": "tech"},
    {"id": 2, "title": "Python Tips", "desc": "Advanced tricks with Python.", "category": "programming"},
]
engine = PyGenSearch(data, searchable_fields=["title", "desc"])

@app.route("/search")
def search():
    query = request.args.get("q", "")
    results = engine.search(query)
    return jsonify(results)

if __name__ == "__main__":
    app.run(debug=True)

Integrate with HTML/JS Frontend

You can create a simple frontend that calls your Flask API:

HTML/JS Example:

<input id="search" placeholder="Search...">
<ul id="results"></ul>
<script>
document.getElementById('search').addEventListener('input', async function() {
    const q = this.value;
    const res = await fetch('/search?q=' + encodeURIComponent(q));
    const data = await res.json();
    document.getElementById('results').innerHTML =
        data.map(item => `<li>${item.title}</li>`).join('');
});
</script>

Integrate with React/Next.js

In your React or Next.js app, call your Flask API:

// Example React component
import { useState } from "react";

function Search() {
  const [query, setQuery] = useState("");
  const [results, setResults] = useState([]);

  async function handleSearch(e) {
    setQuery(e.target.value);
    const res = await fetch(`/search?q=${encodeURIComponent(e.target.value)}`);
    const data = await res.json();
    setResults(data);
  }

  return (
    <div>
      <input value={query} onChange={handleSearch} placeholder="Search..." />
      <ul>
        {results.map(item => <li key={item.id}>{item.title}</li>)}
      </ul>
    </div>
  );
}

export default Search;

Advanced Usage

Custom Tokenizers

from nltk.stem import SnowballStemmer
stemmer = SnowballStemmer("english")

def custom_tokenizer(text: str):
    return [stemmer.stem(tok) for tok in text.lower().split()]

engine = PyGenSearch(
    data=docs,
    tokenizer=custom_tokenizer
)

Performance Optimization

Technique

When to Use

Gains

Batch Indexing (batch_size)

Large initial corpus

1.5-2× faster indexing

Cython Build (pip install pygen-search[cython])

CPU-bound search

3-7× query speed-up

Memory Mapping (mmap index file)

Multi-process web servers

Shared index ≤ RAM

Sharding (engine.split(shards=4))

>10M docs

Scales horizontally

Prefetch Cache (LRUCache)

Hot query patterns

60-80 % latency drop

Tip: Profile first! engine.diagnostics.profile(query="...") prints token hit stats, posting list scans, and ranking cost.


Contributing

We 💜 contributions! To get started:

  1. Fork the repo & create your branch git checkout -b feat/my-feature.

  2. Commit your changes with linting (pre-commit install).

  3. Write tests in tests/ (pytest).

  4. Submit a PR – GitHub Actions will run CI automatically.


License

PyGenSearch is released under the MIT License. See the full text in LICENSE.

Made with ❤️ by PyGen Labs – because every app deserves great search.

Updated on