Search is not just a feature – it’s the foundation of modern digital experiences.
PyGenSearch was born to save you time: writing a production-ready search engine from scratch is hard and distracting. Drop PyGenSearch into your app and get a ready-to-go, blazing-fast search engine in minutes.
Why PyGenSearch?
Pain Without PyGenSearch | Gain With PyGenSearch |
|---|---|
Weeks building bespoke indexing | Minutes to production-ready search |
Complex scoring & ranking logic | Sensible defaults, easy overrides |
Scaling headaches when data grows | Memory-efficient index & async I/O |
Boilerplate for every framework | First-class adapters for Flask, FastAPI, Django & more |
Limited dev hours diverted from core product | Focus on your product, not search internals |
PyGenSearch fits perfectly when building:
-
E-commerce (product discovery & faceted filtering)
-
Content management systems (blog, docs, news)
-
Internal tools & dashboards
-
Knowledge bases / support portals
-
Social & community platforms
Key Features
Core Capabilities
Feature | Details |
|---|---|
Lightning-Fast Search | Optimized in-memory inverted index built with Cython; sub-millisecond look-ups on ~50k docs. |
Smart Matching | Fuzzy matching, typo-tolerance, n-gram & prefix queries, configurable similarity thresholds. |
Relevance Ranking | TF-IDF & BM25 scoring out-of-the-box, custom weighting per field, boost hooks. |
Live Indexing | Add / update / delete documents at runtime without blocking searches. |
Developer Experience | Typed API, rich docstrings, Pydantic models, async/await variants, exhaustive examples. |
Quick Start
Installation
pip install pygen-search
Basic Usage
from pygen_search import PyGenSearch
documents = [
{
"id": 1,
"title": "Getting Started with Python",
"content": "Python is a versatile programming language loved by millions...",
"tags": ["programming", "python", "beginners"],
"category": "education",
"date_published": "2024-09-10"
},
# ... more docs ...
]
engine = PyGenSearch(
data=documents,
searchable_fields=["title", "content", "tags"]
)
print(engine.search("python programming"))
Core Concepts
Search Architecture
-
Indexing Layer
-
Text preprocessing (lower-casing, stop-word removal, stemming/lemmatization).
-
Token extraction ➜ n-grams + prefixes for fuzzy matching.
-
Inverted index stored in compressed postings lists.
-
Configurable per-field boosts and weights.
-
-
Query Processing
-
Query parsing (phrase, boolean, wildcard).
-
Candidate set retrieval (skip-lists for fast seeking).
-
Scoring (BM25 by default) ➜ boost hooks ➜ post-filtering.
-
Faceting & pagination.
-
Data Flow
Integration Guides
Flask API Example
from flask import Flask, request, jsonify
from pygen_search import PyGenSearch
app = Flask(__name__)
data = [
{"id": 1, "title": "Learn AI Today", "desc": "Machine Learning is fun.", "category": "tech"},
{"id": 2, "title": "Python Tips", "desc": "Advanced tricks with Python.", "category": "programming"},
]
engine = PyGenSearch(data, searchable_fields=["title", "desc"])
@app.route("/search")
def search():
query = request.args.get("q", "")
results = engine.search(query)
return jsonify(results)
if __name__ == "__main__":
app.run(debug=True)
Integrate with HTML/JS Frontend
You can create a simple frontend that calls your Flask API:
HTML/JS Example:
<input id="search" placeholder="Search...">
<ul id="results"></ul>
<script>
document.getElementById('search').addEventListener('input', async function() {
const q = this.value;
const res = await fetch('/search?q=' + encodeURIComponent(q));
const data = await res.json();
document.getElementById('results').innerHTML =
data.map(item => `<li>${item.title}</li>`).join('');
});
</script>
Integrate with React/Next.js
In your React or Next.js app, call your Flask API:
// Example React component
import { useState } from "react";
function Search() {
const [query, setQuery] = useState("");
const [results, setResults] = useState([]);
async function handleSearch(e) {
setQuery(e.target.value);
const res = await fetch(`/search?q=${encodeURIComponent(e.target.value)}`);
const data = await res.json();
setResults(data);
}
return (
<div>
<input value={query} onChange={handleSearch} placeholder="Search..." />
<ul>
{results.map(item => <li key={item.id}>{item.title}</li>)}
</ul>
</div>
);
}
export default Search;
Advanced Usage
Custom Tokenizers
from nltk.stem import SnowballStemmer
stemmer = SnowballStemmer("english")
def custom_tokenizer(text: str):
return [stemmer.stem(tok) for tok in text.lower().split()]
engine = PyGenSearch(
data=docs,
tokenizer=custom_tokenizer
)
Performance Optimization
Technique | When to Use | Gains |
|---|---|---|
Batch Indexing ( | Large initial corpus | 1.5-2× faster indexing |
Cython Build ( | CPU-bound search | 3-7× query speed-up |
Memory Mapping (mmap index file) | Multi-process web servers | Shared index ≤ RAM |
Sharding ( | >10M docs | Scales horizontally |
Prefetch Cache ( | Hot query patterns | 60-80 % latency drop |
Tip: Profile first!
engine.diagnostics.profile(query="...")prints token hit stats, posting list scans, and ranking cost.
Contributing
We 💜 contributions! To get started:
-
Fork the repo & create your branch
git checkout -b feat/my-feature. -
Commit your changes with linting (
pre-commit install). -
Write tests in
tests/(pytest). -
Submit a PR – GitHub Actions will run CI automatically.
License
PyGenSearch is released under the MIT License. See the full text in LICENSE.
Made with ❤️ by PyGen Labs – because every app deserves great search.