PyGenSearch SDK Documentation – Fast & Scalable Python Search Engine Library

Search is not just a feature – it’s the foundation of modern digital experiences.
PyGenSearch was born to save you time: writing a production-ready search engine from scratch is hard and distracting. Drop PyGenSearch into your app and get a ready-to-go, blazing-fast search engine in minutes.

Why PyGenSearch?

Pain Without PyGenSearch	Gain With PyGenSearch
Weeks building bespoke indexing	Minutes to production-ready search
Complex scoring & ranking logic	Sensible defaults, easy overrides
Scaling headaches when data grows	Memory-efficient index & async I/O
Boilerplate for every framework	First-class adapters for Flask, FastAPI, Django & more
Limited dev hours diverted from core product	Focus on your product, not search internals

PyGenSearch fits perfectly when building:

E-commerce (product discovery & faceted filtering)
Content management systems (blog, docs, news)
Internal tools & dashboards
Knowledge bases / support portals
Social & community platforms

Key Features

Core Capabilities

Feature	Details
Lightning-Fast Search	Optimized in-memory inverted index built with Cython; sub-millisecond look-ups on ~50k docs.
Smart Matching	Fuzzy matching, typo-tolerance, n-gram & prefix queries, configurable similarity thresholds.
Relevance Ranking	TF-IDF & BM25 scoring out-of-the-box, custom weighting per field, boost hooks.
Live Indexing	Add / update / delete documents at runtime without blocking searches.
Developer Experience	Typed API, rich docstrings, Pydantic models, async/await variants, exhaustive examples.

Quick Start

Installation

pip install pygen-search

Basic Usage

from pygen_search import PyGenSearch

documents = [
    {
        "id": 1,
        "title": "Getting Started with Python",
        "content": "Python is a versatile programming language loved by millions...",
        "tags": ["programming", "python", "beginners"],
        "category": "education",
        "date_published": "2024-09-10"
    },
    # ... more docs ...
]

engine = PyGenSearch(
    data=documents,
    searchable_fields=["title", "content", "tags"]
)

print(engine.search("python programming"))

Core Concepts

Search Architecture

Indexing Layer
- Text preprocessing (lower-casing, stop-word removal, stemming/lemmatization).
- Token extraction ➜ n-grams + prefixes for fuzzy matching.
- Inverted index stored in compressed postings lists.
- Configurable per-field boosts and weights.
Query Processing
- Query parsing (phrase, boolean, wildcard).
- Candidate set retrieval (skip-lists for fast seeking).
- Scoring (BM25 by default) ➜ boost hooks ➜ post-filtering.
- Faceting & pagination.

Data Flow

Integration Guides

Flask API Example

from flask import Flask, request, jsonify
from pygen_search import PyGenSearch

app = Flask(__name__)

data = [
    {"id": 1, "title": "Learn AI Today", "desc": "Machine Learning is fun.", "category": "tech"},
    {"id": 2, "title": "Python Tips", "desc": "Advanced tricks with Python.", "category": "programming"},
]
engine = PyGenSearch(data, searchable_fields=["title", "desc"])

@app.route("/search")
def search():
    query = request.args.get("q", "")
    results = engine.search(query)
    return jsonify(results)

if __name__ == "__main__":
    app.run(debug=True)

Integrate with HTML/JS Frontend

You can create a simple frontend that calls your Flask API:

HTML/JS Example:

<input id="search" placeholder="Search...">
<ul id="results"></ul>
<script>
document.getElementById('search').addEventListener('input', async function() {
    const q = this.value;
    const res = await fetch('/search?q=' + encodeURIComponent(q));
    const data = await res.json();
    document.getElementById('results').innerHTML =
        data.map(item => `<li>${item.title}</li>`).join('');
});
</script>

Integrate with React/Next.js

In your React or Next.js app, call your Flask API:

// Example React component
import { useState } from "react";

function Search() {
  const [query, setQuery] = useState("");
  const [results, setResults] = useState([]);

  async function handleSearch(e) {
    setQuery(e.target.value);
    const res = await fetch(`/search?q=${encodeURIComponent(e.target.value)}`);
    const data = await res.json();
    setResults(data);
  }

  return (
    <div>
      <input value={query} onChange={handleSearch} placeholder="Search..." />
      <ul>
        {results.map(item => <li key={item.id}>{item.title}</li>)}
      </ul>
    </div>
  );
}

export default Search;

Advanced Usage

Custom Tokenizers

from nltk.stem import SnowballStemmer
stemmer = SnowballStemmer("english")

def custom_tokenizer(text: str):
    return [stemmer.stem(tok) for tok in text.lower().split()]

engine = PyGenSearch(
    data=docs,
    tokenizer=custom_tokenizer
)

Performance Optimization

Technique	When to Use	Gains
Batch Indexing (`batch_size`)	Large initial corpus	1.5-2× faster indexing
Cython Build (`pip install pygen-search[cython]`)	CPU-bound search	3-7× query speed-up
Memory Mapping (mmap index file)	Multi-process web servers	Shared index ≤ RAM
Sharding (`engine.split(shards=4)`)	>10M docs	Scales horizontally
Prefetch Cache (`LRUCache`)	Hot query patterns	60-80 % latency drop

Tip: Profile first! engine.diagnostics.profile(query="...") prints token hit stats, posting list scans, and ranking cost.

Contributing

We 💜 contributions! To get started:

Fork the repo & create your branch git checkout -b feat/my-feature.
Commit your changes with linting (pre-commit install).
Write tests in tests/ (pytest).
Submit a PR – GitHub Actions will run CI automatically.

License

PyGenSearch is released under the MIT License. See the full text in LICENSE.

Made with ❤️ by PyGen Labs – because every app deserves great search.