Skip to content

Commit 3b30bcd

Browse files
author
Daniele Briggi
committed
feat(semsearch): example semantic search
1 parent d35035a commit 3b30bcd

24 files changed

+451
-0
lines changed

examples/semantic_search/README.md

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
## sqlite-vector Semantic Search Example
2+
3+
This example in Python demonstrates how to build a semantic search engine using the [sqlite-vector](https://github.com/sqliteai/sqlite-vector) extension and a Sentence Transformer model. It allows you to index and search documents using vector similarity, powered by a local LLM embedding model.
4+
5+
### How it works
6+
7+
- **Embeddings**: Uses [sentence-transformers](https://huggingface.co/sentence-transformers) to generate dense vector representations (embeddings) for text. The default model is [`all-MiniLM-L6-v2`](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2), a fast, lightweight model (384 dimensions) suitable for semantic search and retrieval tasks.
8+
- **Vector Store and Search**: Embeddings are stored in SQLite using the [`sqlite-vector`](https://github.com/sqliteai/sqlite-vector) extension, enabling fast similarity search (cosine distance) directly in the database.
9+
- **Sample Data**: The `samples/` directory contains example documents you can index and search immediately.
10+
11+
### Installation
12+
13+
```bash
14+
$ python -m venv venv
15+
16+
$ source venv/bin/activate
17+
18+
$ pip install -r requirements.txt
19+
```
20+
21+
On first use, the required model will be downloaded automatically.
22+
23+
### Usage
24+
25+
Use the interactive mode to keep the model in memory and run multiple queries efficiently:
26+
27+
```bash
28+
python semsearch.py --repl
29+
30+
# Index a directory of documents
31+
semsearch> index ./samples
32+
33+
# Search for similar documents
34+
semsearch> search "neural network architectures for image recognition"
35+
```
36+
37+
### Example Queries
38+
39+
Try these queries to test semantic similarity:
40+
41+
- "neural network architectures for image recognition"
42+
- "reinforcement learning in autonomous systems"
43+
- "explainable artificial intelligence methods"
44+
- "AI governance and regulatory compliance"
45+
- "network intrusion detection systems"
46+
47+
**Note:**
48+
- Supported extension are `.md`, `.txt`, `.py`, `.js`, `.html`, `.css`, `.sql`, `.json`, `.xml`.
49+
- For more details, see the code in `semsearch.py` and `semantic_search.py`.
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
sentence-transformers
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# Article 1: Deep Learning Neural Networks
2+
3+
Deep learning utilizes artificial neural networks with multiple layers to process and learn from vast amounts of data. These networks automatically discover intricate patterns and representations without manual feature engineering. Convolutional neural networks excel at image recognition tasks, while recurrent neural networks handle sequential data like text and speech. Popular frameworks include TensorFlow, PyTorch, and Keras. Deep learning has revolutionized computer vision, natural language processing, and speech recognition applications.
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# Article 10: Zero Trust Security Architecture
2+
3+
Zero trust security operates on the principle of "never trust, always verify," requiring authentication and authorization for every access request regardless of location. This approach assumes breach scenarios and implements continuous verification throughout the network. Key components include identity verification, device compliance checking, least privilege access, and micro-segmentation. Zero trust frameworks help organizations protect against insider threats and advanced persistent attacks.
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# Article 11: Incident Response and Recovery
2+
3+
Effective incident response requires predefined procedures for detecting, containing, and recovering from security breaches. Response teams follow structured phases: preparation, identification, containment, eradication, recovery, and lessons learned. Critical activities include forensic analysis, stakeholder communication, system restoration, and process improvement. Regular tabletop exercises and response plan updates ensure organizations can quickly minimize damage and restore normal operations after security incidents.
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# Article 12: Machine Learning for Malware Detection
2+
3+
Machine learning enhances malware detection by analyzing file characteristics, behavioral patterns, and network communications to identify threats. Static analysis examines file properties without execution, while dynamic analysis observes runtime behavior in controlled environments. Ensemble methods combining multiple algorithms improve detection accuracy and reduce false positives. AI-powered systems can identify zero-day threats and polymorphic malware that traditional signature-based solutions miss.
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# Article 13: Behavioral Analytics for Anomaly Detection
2+
3+
Behavioral analytics leverages machine learning to establish baseline patterns of normal user and system behavior, flagging deviations that may indicate security threats. User and entity behavior analytics (UEBA) systems monitor login patterns, data access, and application usage to detect insider threats and compromised accounts. Machine learning models adapt to changing behavior patterns while maintaining sensitivity to subtle anomalies that human analysts might overlook.
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# Article 14: AI-Driven Security Orchestration
2+
3+
Security orchestration platforms integrate multiple security tools and automate incident response workflows using artificial intelligence. These systems correlate alerts from various sources, prioritize threats based on risk assessment, and execute automated remediation actions. Natural language processing helps analyze threat intelligence reports, while machine learning improves decision-making accuracy over time. Orchestration reduces response times and analyst workload while maintaining consistent security procedures.
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# Article 15: Advanced Persistent Threats (APTs)
2+
3+
Advanced persistent threats represent sophisticated, long-term cyberattacks typically conducted by nation-states or organized criminal groups. APTs use multiple attack vectors, maintain persistent access, and employ stealth techniques to avoid detection. Common tactics include spear-phishing, zero-day exploits, living-off-the-land techniques, and lateral movement within networks. Defense requires continuous monitoring, threat hunting, and intelligence-driven security strategies to detect and neutralize these patient adversaries.
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# Article 16: Social Engineering Attack Vectors
2+
3+
Social engineering exploits human psychology rather than technical vulnerabilities to gain unauthorized access to systems and information. Common techniques include phishing emails, pretexting phone calls, baiting with infected media, and physical tailgating. Attackers research targets through social media and public information to craft convincing scenarios. Defense requires security awareness training, verification procedures, and creating organizational cultures that encourage reporting suspicious communications.

0 commit comments

Comments
 (0)