Skip to content

Commit ad0791e

Browse files
authored
Expand README for src/search (#58906)
1 parent f790db7 commit ad0791e

File tree

1 file changed

+84
-0
lines changed

1 file changed

+84
-0
lines changed

src/search/README.md

Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -80,3 +80,87 @@ The preferred way to build and sync the search indices is to do so via the [GitH
8080
- Our search querying has lots of controls for customizing each index, so we can add weights to certain attributes and create rules like "title is more important than body", etc. But it works pretty well as-is without any configuration.
8181
- Our search querying has support for "advanced query syntax" for exact matching of quoted expressions and exclusion of words preceded by a `-` sign. This is off by default, but it is enabled in our browser client. The settings in the web interface can be overridden by the search endpoint. See [middleware/search.ts](middleware/search.ts).
8282
- When needed, the Docs Engineering team can commit updates to the search index, as long as the label `skip-index-check` is applied to the PR.
83+
84+
## Ownership & On-call
85+
86+
### Ownership
87+
- **Team**: Docs Engineering
88+
- **Primary contacts**: @docs-engineering (GitHub team)
89+
- **Search infrastructure**: Internal Elasticsearch cluster for autocomplete and general search results, and an external RAG app ([cse-copilot](https://github.com/github/cse-copilot)) owned by @github/customer-success-engineering for LLM-generated responses
90+
- **Slack**: #docs-engineering
91+
92+
### On-call procedures
93+
If search is not working:
94+
1. **Check search health**
95+
- Test search on docs.github.com
96+
- Check Elasticsearch cluster status (internal)
97+
- Review recent deploys and index updates
98+
99+
2. **Index issues**
100+
- Check `.github/workflows/index-general-search.yml` logs
101+
- Verify last successful index run
102+
- Test manual index update for single version/language
103+
104+
3. **API issues**
105+
- Check `/api/search/v1` endpoint
106+
- Review middleware logs for errors
107+
- Test search queries directly against API
108+
109+
## Roadmap Items
110+
111+
### High priority improvements
112+
- **Real-time indexing** - Reduce lag between content changes and search index
113+
- **Relevance tuning** - Improve search result ranking and quality
114+
- **Performance optimization** - Faster search queries and results
115+
- **Version handling** - Better support for version-specific search
116+
- **Language support** - Improve multilingual search quality
117+
118+
### Medium priority enhancements
119+
- **Faceted search** - Filter by product, version, content type
120+
- **Search analytics** - Track what users are searching for
121+
- **Did you mean** - Suggest corrections for misspellings
122+
- **Related searches** - Show similar or related queries
123+
- **Result previews** - Better snippets and highlighting
124+
125+
### AI search improvements
126+
- **Query understanding** - Better interpret user intent
127+
- **Answer generation** - Provide direct answers, not just links
128+
- **Contextual results** - Consider user's current page/version
129+
- **Personalization** - Learn from search patterns
130+
131+
### Technical improvements
132+
- **Index efficiency** - Reduce index size and update time
133+
- **Cache optimization** - Improve query caching
134+
- **API versioning** - Stable search API with version control
135+
- **Testing coverage** - More comprehensive search tests
136+
- **Error handling** - Better error messages and recovery
137+
138+
### Infrastructure enhancements
139+
- **Elasticsearch upgrade** - Keep cluster up to date
140+
- **Redundancy** - Improve search availability
141+
- **Monitoring** - Better observability of search health
142+
- **Cost optimization** - Reduce Elasticsearch costs
143+
144+
### Content quality
145+
- **Index validation** - Ensure all pages are indexed correctly
146+
- **Freshness indicators** - Show when content was last updated
147+
- **Broken link detection** - Identify 404s in search results
148+
- **Duplicate detection** - Prevent duplicate results
149+
150+
Search is largely KTLO (keep the lights on). We will continue to ensure the search is working as expected and support updates to both Elasticsearch and Copilot models underlying our search.
151+
152+
## Known Limitations
153+
154+
### Current constraints
155+
- **Index lag** - 24-hour delay between content changes and search updates
156+
- **Manual triggers** - Urgent updates require manual workflow run
157+
- **Full reindex** - Can't update individual pages incrementally
158+
- **Version complexity** - Hard to search across all versions simultaneously
159+
160+
### Performance considerations
161+
- Full index rebuild takes ~40 minutes for all versions/languages
162+
- Single version/language takes ~5-10 minutes
163+
- Search queries cached but cache can become stale
164+
- High search volume can impact Elasticsearch cluster
165+
166+

0 commit comments

Comments
 (0)