JSK9999 · a7medsa22 · Mar 8, 2026 · Mar 8, 2026 · Mar 8, 2026 · Mar 10, 2026
diff --git a/config/skills/aws.md b/config/skills/aws.md
@@ -0,0 +1,74 @@
+---
+description: AWS cloud infrastructure and serverless best practices
+keywords: [aws, lambda, iam, s3, dynamodb, rds, cloudformation, cdk, serverless, ec2, cloudwatch, security, cost-optimization]
+last_updated: 2026-03
+---
+
+# AWS Cloud Infrastructure
+
+Actionable rules for secure, cost-effective AWS development. Focus: serverless, IaC, and managed services.
+
+## When to activate
+- Designing serverless architectures or Lambda functions
+- Configuring IAM policies, roles, or permissions
+- Using S3, DynamoDB, RDS, or other AWS services
+- Writing CloudFormation/CDK templates
+- Auditing security, cost, or compliance
+
+## IAM: least privilege essentials
+- Grant minimal permissions; use `Action`/`Resource` constraints, never `*` unless justified.
+- Use roles for services (e.g., Lambda execution role), not user credentials.
+- Enable MFA for root and privileged users; rotate access keys regularly.
+- Use IAM Policy Simulator to test permissions before deploy.
+
+## Lambda/serverless patterns
+- Keep functions small, single-purpose; max 5min timeout unless async.
+- Use environment variables for config; store secrets in AWS Secrets Manager.
+- Enable X-Ray for tracing; set reserved concurrency to prevent throttling.
+- Package dependencies efficiently; use layers for shared code.
+
+```javascript
+// Lambda handler skeleton (Node.js)
+exports.handler = async (event) => {
+  const { action, payload } = JSON.parse(event.body || '{}');
+  // Validate → Process → Return
+  return { statusCode: 200, body: JSON.stringify({ result: 'ok' }) };
+};
+```
+
+## Service quick guidelines
+- **S3**: Enable versioning + encryption (SSE-S3/KMS); use lifecycle rules for cost; block public access by default.
+- **DynamoDB**: Use on-demand for unpredictable traffic; design PK/SK for query patterns; add GSIs sparingly.
+- **RDS**: Use parameter groups; enable automated backups; prefer Aurora Serverless for variable loads.
+- **CloudWatch**: Set alarms on errors, duration, throttles; use structured logs.
+
+## CloudFormation/CDK conventions
+- Use CDK (TypeScript/Python) for complex logic; CloudFormation YAML for simple stacks.
+- Parameterize environments (`dev`/`prod`); avoid hardcoding account IDs/regions.
+- Use `cdk diff` before deploy; enable termination protection for prod stacks.
+- Tag all resources: `App`, `Env`, `Owner` for cost allocation.
+
+## ❌ Anti-patterns
+- Hardcoding credentials or account IDs in code/templates
+- Using `iam:PassRole` with `Resource: "*"`
+- Provisioned concurrency for all Lambdas (costly)
+- S3 buckets with public read/write by default
+- Monolithic Lambda functions handling multiple unrelated tasks
+
+## Quick checklist (PRs)
+- [ ] IAM policies scoped to specific actions/resources
+- [ ] Secrets in Secrets Manager/SSM, not plain env vars
+- [ ] S3/DynamoDB encryption enabled at rest
+- [ ] CloudFormation/CDK templates parameterized per env
+- [ ] Tags (`App`, `Env`, `Owner`) applied for cost tracking
+- [ ] CloudWatch alarms configured for errors/throttles
+
+## Example prompts (semantic router)
+"aws lambda best practices"
+"iam least privilege policy example"
+"s3 encryption lifecycle rules"
+"dynamodb partition key design"
+"cdk stack parameterization"
+"aws serverless cost optimization"
+"cloudformation vs cdk when to use"
+"aws secrets manager vs ssm parameters"
diff --git a/config/skills/database.md b/config/skills/database.md
@@ -54,4 +54,4 @@ const result = await db.query(
 ```
 - Apply least-privilege for DB users
 - Avoid storing secrets; use connection strings from env
-- Audit sensitive operations (DDL, bulk deletes)
+- Audit sensitive operations (DDL, bulk deletes)
diff --git a/config/skills/mongodb.md b/config/skills/mongodb.md
@@ -0,0 +1,182 @@
+---
+description: Practical MongoDB / NoSQL best practices for schema design, indexing, aggregation pipeline optimization, and secure production usage.
+keywords: [mongodb, nosql, mongoose, aggregation, index, indexing, compound, ttl, text, schema, embedding, referencing, sharding, transactions, performance, query, pipeline]
+---
+
+# MongoDB (NoSQL)
+
+Practical guidance for building efficient, maintainable, and secure MongoDB-backed applications. Focus is on actionable rules, decision heuristics, and small examples.
+
+## When to activate
+- Building or optimizing MongoDB queries or aggregations
+- Designing collection/document schemas
+- Adding or auditing indexes
+- Implementing security for stored data or connections
+- Working on sharding, transactions, or time-series data
+
+---
+
+## Indexing: when & which type
+- **Single-field**: good for high-cardinality single-field lookups.  
+- **Compound**: use when queries filter/sort by multiple fields. Put the most selective / most-filtered field first.
+  - Example: `{ age: 1, status: 1 }` helps queries with `age` and `status`.
+  - Order matters: `{ status: 1, age: 1 }` won't help a query that only filters by `age`.
+- **Text index**: use for full-text search across string fields. Avoid over-indexing — use dedicated search (e.g., Atlas Search) if heavy usage.
+- **TTL index**: use for expiring ephemeral data (sessions, caches). Set `expireAfterSeconds`.
+- **Wildcard index**: `{ "$**": 1 }` — use sparingly for schemas with many dynamic fields; increases write latency ~15-30% and storage 2-5x.
+- **Practical checks**
+  - Run `db.collection.explain('executionStats')` to check index usage.
+  - Avoid indexes on very high-write, low-read fields.
+  - Remove unused indexes (costly on writes & storage): `db.collection.getIndexes()` + `dropIndex()`.
+
+### Mongoose index snippets
+```javascript
+// Compound index (background build to avoid blocking)
+userSchema.index({ email: 1, status: 1 }, { background: true });
+
+// TTL index for auto-expiry (e.g., sessions expire after 24h)
+sessionSchema.index({ createdAt: 1 }, { expireAfterSeconds: 86400 });
+
+// Text index with field weights
+postSchema.index({ title: 'text', body: 'text' }, { weights: { title: 3, body: 1 } });
+
+// Sparse index for optional fields (only indexes docs where field exists)
+profileSchema.index({ twitterHandle: 1 }, { sparse: true });
+```
+
+---
+
+## Embedding vs Referencing — practical decision rules
+- **Embed** when:
+  - One-to-few relationships (comments on a post <100).
+  - Data accessed together most of the time.
+  - Example: post with small list of tags, metadata.
+- **Reference** when:
+  - One-to-many with large or growing lists (orders, logs).
+  - The child is large or independently updated.
+  - Many-to-many relationships.
+- **Hybrid approach**:
+  - Denormalize frequently-read fields (e.g., username snapshot on comment) but keep authoritative source in referenced doc.
+- **Quick rule**: model around access patterns, not perfectly normalized schema.
+
+### Mongoose embedding example with validation
+```javascript
+postSchema.add({
+  comments: [{
+    userId: { type: Schema.Types.ObjectId, ref: 'User', required: true },
+    text: { type: String, maxlength: 500, required: true },
+    createdAt: { type: Date, default: Date.now }
+  }],
+  commentCount: { 
+    type: Number, 
+    validate: { 
+      validator: v => v <= 100, 
+      message: 'Max 100 embedded comments; use referencing for more' 
+    }
+  }
+});
+```
+
+---
+
+## Data modeling patterns (common practical patterns)
+- **Bucket pattern**: group many small time-series events into monthly/day buckets to avoid huge arrays.
+  ```javascript
+  // Example: metrics bucketed by day
+  {
+    _id: deviceId,
+    bucketDate: ISODate("2026-03-08"),
+    readings: [ { ts: ..., value: ... }, ... ] // capped at ~1000 docs
+  }
+  ```
+- **Outlier pattern**: keep typical documents compact; move unusually large data to separate collection with reference.
+- **Polymorphic / type-discriminator**: use a `type` field + sparse fields per type, index `type` + commonly-queried fields.
+- **Time-series**: prefer MongoDB time-series collections (5.0+) or bucket pattern for high-frequency data.
+
+---
+
+## Aggregation pipeline optimization
+- **$match early**: filter as soon as possible to reduce pipeline volume.
+- **$project early**: drop unneeded fields prior to heavy stages like `$group`.
+- **Use indexes before aggregation**: if possible, use `$match` on indexed fields so the engine can use indexes.
+- **Avoid memory spikes**: `$group` can be memory heavy; add `$limit` / `$sort` with proper indexes.
+- **Use `allowDiskUse:true`** for large jobs, but prefer pre-filtering.
+- **Pipeline ordering cheat sheet**: `$match` → `$project` → `$lookup` (if needed) → `$group` → `$sort` → `$limit`.
+
+```javascript
+// Example: aggregation to compute per-customer revenue (optimized)
+db.orders.aggregate([
+  { $match: { status: "completed", createdAt: { $gte: ISODate("2026-01-01") } } }, // indexed filters first
+  { $project: { customerId: 1, amount: 1, _id: 0 } }, // remove heavy/unneeded fields
+  { $group: { _id: "$customerId", total: { $sum: "$amount" } } },
+  { $sort: { total: -1 } },
+  { $limit: 100 }
+], { allowDiskUse: false });
+```
+
+---
+
+## ❌ Common anti-patterns to avoid
+- Storing unbounded arrays (e.g., activity logs) → use bucket pattern or separate collection
+- Using `$where` or client-side evaluation → blocks indexing, slow, security risk
+- Over-using `$lookup` in high-traffic aggregations → denormalize hot paths or pre-aggregate
+- Creating indexes on low-cardinality fields alone (e.g., `gender: 1`) → rarely selective, wastes write capacity
+- Ignoring document size limit (16MB) → monitor with `$objSize` or schema validation
+
+---
+
+## Transactions, consistency & sharding (practical notes)
+- Use multi-document transactions only when necessary — they add latency (~2-3x) and complexity.
+- Favor single-document atomic operations (atomic by design) when possible.
+- For sharding:
+  - Choose shard key based on write and query patterns (avoid monotonically increasing keys like `createdAt` alone).
+  - Use hashed shard keys for even distribution if queries don't filter by range.
+  - Monitor chunk distribution and balancing via `sh.status()`.
+  - Test transactions/sharding in staging with representative data sizes.
+
+---
+
+## Production & security considerations
+- Never expose MongoDB directly to the internet — use private networks, VPC peering, or Atlas private endpoints.
+- Authentication & RBAC: enable SCRAM / x.509 and use least-privilege roles.
+- Field-level encryption: consider for highly sensitive fields (PII, tokens) using MongoDB Client-Side Field Level Encryption (CSFLE).
+- Secrets: store connection strings / credentials in environment variables or secret manager; do not embed in code.
+- TLS: require TLS for all connections (`tls=true` in connection string).
+- Backups & PITR: ensure regular backups; test restore procedures quarterly.
+- Audit & logging: enable audit logs for sensitive operations where compliance requires.
+
+---
+
+## Monitoring & performance checks
+- Use Profiler and `system.profile` to find slow queries:  
+  `db.setProfilingLevel(1, { slowms: 100 })`
+- Monitor `db.serverStatus()` metrics: `opcounters`, `asserts`, `connections`, `mem`, `metrics.document`.
+- Regularly review index usage: `db.collection.aggregate([{ $indexStats: {} }])`.
+- Plan index changes after schema or query changes — test with `explain('executionStats')`.
+
+---
+
+## Quick checklists (for PRs/Code Reviews)
+- [ ] Queries have appropriate filters and projections (no `find({})` without limits)
+- [ ] Critical fields are indexed (and compound indexes tested with real query patterns)
+- [ ] Aggregations place `$match` and `$project` early in pipeline
+- [ ] Large arrays or documents have outlier/bucket handling
+- [ ] No hardcoded connection strings or secrets in code
+- [ ] TLS and authentication are enforced in production config
+- [ ] Schema validation rules added for critical collections (optional but recommended)
+
+---
+
+## Example prompts (helps semantic router match this file)
+```
+"mongodb aggregation pipeline optimization"
+"optimize mongoose query performance"
+"mongodb ttl index sessions"
+"embedding vs referencing mongodb"
+"mongo shard key best practices"
+"mongodb compound index order best practices"
+"how to avoid aggregation memory limit"
+"mongoose populate vs embedding performance"
+"mongodb field level encryption example"
+"time-series collection vs bucket pattern"
+```