diff --git a/docs/operating/index.md b/docs/operating/index.md
index d783f6329..6be192e62 100644
--- a/docs/operating/index.md
+++ b/docs/operating/index.md
@@ -1,5 +1,5 @@
 ---
-title: Operating
+title: Operating Prometheus in Production
 sort_rank: 5
 nav_icon: settings
 ---
diff --git a/docs/operating/monitoring-prometheus.md b/docs/operating/monitoring-prometheus.md
new file mode 100644
index 000000000..48657f108
--- /dev/null
+++ b/docs/operating/monitoring-prometheus.md
@@ -0,0 +1,514 @@
+---
+title: Monitoring Prometheus
+sort_rank: 2
+---
+
+# Monitoring Prometheus
+
+Meta-monitoring (monitoring your monitoring system) is critical for production reliability. This guide covers essential metrics, alerting rules, and dashboards for monitoring Prometheus infrastructure health.
+
+## Essential Prometheus Metrics
+
+### Memory and Performance Metrics
+
+```promql
+# Memory usage by component
+prometheus_tsdb_head_samples_appended_total
+prometheus_tsdb_symbol_table_size_bytes
+prometheus_engine_query_duration_seconds
+
+# Active series and cardinality
+prometheus_tsdb_head_series
+prometheus_tsdb_head_chunks
+
+# Storage utilization
+prometheus_tsdb_blocks_loaded
+prometheus_tsdb_compactions_total
+prometheus_tsdb_compactions_failed_total
+```
+
+### Query Performance Monitoring
+
+```promql
+# Query latency percentiles
+histogram_quantile(0.95, 
+  rate(prometheus_engine_query_duration_seconds_bucket[5m])
+)
+
+# Concurrent queries
+prometheus_engine_queries_concurrent_max
+prometheus_engine_queries
+
+# Slow queries (>30s)
+increase(prometheus_engine_query_duration_seconds_bucket{le="30"}[5m])
+```
+
+### Ingestion and Scraping Health
+
+```promql
+# Samples ingested per second
+rate(prometheus_tsdb_head_samples_appended_total[5m])
+
+# Failed scrapes
+up == 0
+
+# Scrape duration
+prometheus_target_scrapes_exceeded_sample_limit_total
+prometheus_target_scrape_duration_seconds
+```
+
+### Storage Health
+
+```promql
+# WAL disk usage
+prometheus_tsdb_wal_fsync_duration_seconds
+prometheus_tsdb_wal_corruptions_total
+
+# Compaction metrics
+rate(prometheus_tsdb_compactions_total[5m])
+prometheus_tsdb_compactions_failed_total
+
+# Block loading issues
+prometheus_tsdb_blocks_loaded
+prometheus_tsdb_head_truncations_failed_total
+```
+
+## Critical Alerting Rules
+
+### **Prometheus Monitoring Mixins**
+
+Instead of maintaining alerting rules inline (which can become outdated), we recommend using the official Prometheus monitoring mixins that are maintained alongside the codebase:
+
+**📋 Official Prometheus Monitoring Mixin**
+- **Repository**: [prometheus/prometheus](https://github.com/prometheus/prometheus/tree/main/documentation/prometheus-mixin)
+- **Maintained**: Versioned with Prometheus releases
+- **Coverage**: Production-ready alerts for Prometheus infrastructure health
+- **Installation**: Follow the mixin documentation for your environment
+
+**Key Alert Categories Covered**:
+- Prometheus instance health and availability
+- High memory usage and resource constraints  
+- Query performance and latency issues
+- Storage and WAL-related problems
+- Target scraping failures and connectivity
+
+**🔗 Additional Community Mixins**:
+- [monitoring-mixins/prometheus-mixin](https://monitoring.mixins.dev/prometheus/) - Community-maintained alerts
+- [grafana/jsonnet-libs](https://github.com/grafana/jsonnet-libs) - Grafana Labs mixins
+
+### **Example Custom Alerting Rules**
+
+For organizations needing custom alerts beyond the mixins, here are example patterns. **Note**: These are templates that should be adapted and tested for your specific environment:
+
+```yaml
+# Example: Custom capacity planning alerts
+# ⚠️  Disclaimer: Test thoroughly in your environment before production use
+groups:
+- name: prometheus.capacity.examples
+  rules:
+  - alert: PrometheusHighMemoryUsageCustom
+    expr: |
+      (
+        process_resident_memory_bytes{job="prometheus"} / 
+        (1024^3)  # Convert to GB
+      ) > 8  # Adjust threshold for your deployment
+    for: 15m
+    labels:
+      severity: warning
+    annotations:
+      summary: "Prometheus {{ $labels.instance }} memory usage is high"
+      description: "Memory usage is {{ $value }}GB, consider scaling or optimization."
+
+  - alert: PrometheusIngestionRateIncreasing
+    expr: |
+      predict_linear(
+        rate(prometheus_tsdb_head_samples_appended_total[1h])[4h:], 
+        24*3600
+      ) > 50000  # Adjust based on your capacity
+    for: 30m
+    labels:
+      severity: warning
+    annotations:
+      summary: "Prometheus ingestion rate trending high"
+      description: "Predicted to exceed 50k samples/sec within 24 hours."
+```
+
+**📝 Important Notes**:
+- These are **example templates** - adapt thresholds for your environment
+- Test thoroughly before deploying to production
+- Consider contributing improvements back to the official mixins
+
+## Monitoring Dashboard
+
+### Grafana Dashboard JSON
+
+```json
+{
+  "dashboard": {
+    "title": "Prometheus Overview",
+    "panels": [
+      {
+        "title": "Prometheus Instances Status",
+        "type": "stat",
+        "targets": [
+          {
+            "expr": "up{job=\"prometheus\"}",
+            "legendFormat": "{{ instance }}"
+          }
+        ]
+      },
+      {
+        "title": "Memory Usage",
+        "type": "graph",
+        "targets": [
+          {
+            "expr": "process_resident_memory_bytes{job=\"prometheus\"}",
+            "legendFormat": "RSS Memory - {{ instance }}"
+          },
+          {
+            "expr": "process_virtual_memory_bytes{job=\"prometheus\"}",
+            "legendFormat": "Virtual Memory - {{ instance }}"
+          }
+        ]
+      },
+      {
+        "title": "Query Performance",
+        "type": "graph",
+        "targets": [
+          {
+            "expr": "histogram_quantile(0.95, rate(prometheus_engine_query_duration_seconds_bucket[5m]))",
+            "legendFormat": "95th percentile"
+          },
+          {
+            "expr": "histogram_quantile(0.50, rate(prometheus_engine_query_duration_seconds_bucket[5m]))",
+            "legendFormat": "50th percentile"
+          }
+        ]
+      },
+      {
+        "title": "Active Series",
+        "type": "graph",
+        "targets": [
+          {
+            "expr": "prometheus_tsdb_head_series",
+            "legendFormat": "{{ instance }}"
+          }
+        ]
+      },
+      {
+        "title": "Ingestion Rate",
+        "type": "graph",
+        "targets": [
+          {
+            "expr": "rate(prometheus_tsdb_head_samples_appended_total[5m])",
+            "legendFormat": "Samples/sec - {{ instance }}"
+          }
+        ]
+      },
+      {
+        "title": "Storage Usage",
+        "type": "graph",
+        "targets": [
+          {
+            "expr": "prometheus_tsdb_blocks_loaded",
+            "legendFormat": "Blocks Loaded - {{ instance }}"
+          }
+        ]
+      }
+    ]
+  }
+}
+```
+
+## Health Check Endpoints
+
+### **Example HTTP Health Checks**
+
+The following are example scripts for monitoring Prometheus health endpoints. **⚠️ Disclaimer**: These are templates that should be tested and adapted for your specific environment - no CI validates these scripts.
+
+```bash
+#!/bin/bash
+# example-prometheus-health-check.sh
+# ⚠️  Test thoroughly in your environment before production use
+
+PROMETHEUS_URL="http://localhost:9090"
+
+# Basic health check
+echo "=== Basic Health Check ==="
+curl -s "$PROMETHEUS_URL/-/healthy" || echo "Health check failed"
+
+# Readiness check  
+echo "=== Readiness Check ==="
+curl -s "$PROMETHEUS_URL/-/ready" || echo "Readiness check failed"
+
+# Configuration reload status
+echo "=== Configuration Status ==="
+CONFIG_STATUS=$(curl -s "$PROMETHEUS_URL/api/v1/status/config" | jq '.status')
+echo "Config reload status: $CONFIG_STATUS"
+
+# Target status
+echo "=== Target Status ==="
+UP_TARGETS=$(curl -s "$PROMETHEUS_URL/api/v1/targets" | jq '.data.activeTargets | map(select(.health == "up")) | length')
+TOTAL_TARGETS=$(curl -s "$PROMETHEUS_URL/api/v1/targets" | jq '.data.activeTargets | length')
+echo "Healthy targets: $UP_TARGETS/$TOTAL_TARGETS"
+
+# Runtime information
+echo "=== Runtime Information ==="
+curl -s "$PROMETHEUS_URL/api/v1/status/runtimeinfo" | jq '.'
+```
+
+**📝 Usage Notes**:
+- Requires `curl` and `jq` to be installed
+- Adjust `PROMETHEUS_URL` for your deployment
+- Consider adding authentication headers if Prometheus is secured
+- Test timeout and error handling for your environment
+
+### Kubernetes Health Checks
+
+```yaml
+# Kubernetes probes for Prometheus StatefulSet
+livenessProbe:
+  httpGet:
+    path: /-/healthy
+    port: 9090
+  initialDelaySeconds: 30
+  periodSeconds: 15
+  timeoutSeconds: 10
+  failureThreshold: 3
+
+readinessProbe:
+  httpGet:
+    path: /-/ready
+    port: 9090
+  initialDelaySeconds: 30
+  periodSeconds: 5
+  timeoutSeconds: 5
+  failureThreshold: 3
+```
+
+## Performance Monitoring Queries
+
+### Memory Analysis
+
+```promql
+# Top metrics by memory usage
+topk(10, 
+  prometheus_tsdb_symbol_table_size_bytes + 
+  prometheus_tsdb_head_chunks_bytes
+)
+
+# Memory usage by component
+sum by (job) (process_resident_memory_bytes{job="prometheus"})
+
+# Memory growth rate
+increase(process_resident_memory_bytes{job="prometheus"}[1h])
+```
+
+### Query Analysis
+
+```promql
+# Most expensive queries by duration
+topk(10, 
+  rate(prometheus_engine_query_duration_seconds_sum[5m]) / 
+  rate(prometheus_engine_query_duration_seconds_count[5m])
+)
+
+# Query concurrency
+prometheus_engine_queries_concurrent_max
+
+# Failed queries
+rate(prometheus_engine_queries_total{result="error"}[5m])
+```
+
+### Storage Analysis
+
+```promql
+# WAL size growth
+increase(prometheus_tsdb_wal_segment_current[1h])
+
+# Compaction duration
+prometheus_tsdb_compaction_duration_seconds
+
+# Block size distribution
+histogram_quantile(0.95, prometheus_tsdb_compaction_chunk_size_bytes_bucket)
+```
+
+## Automated Monitoring Scripts
+
+### Daily Health Report
+
+```bash
+#!/bin/bash
+# daily-prometheus-report.sh
+
+PROMETHEUS_URL="http://localhost:9090"
+REPORT_DATE=$(date +%Y-%m-%d)
+REPORT_FILE="/var/log/prometheus/daily-report-$REPORT_DATE.txt"
+
+echo "Prometheus Daily Health Report - $REPORT_DATE" > $REPORT_FILE
+echo "================================================" >> $REPORT_FILE
+
+# Instance status
+echo "Instance Status:" >> $REPORT_FILE
+curl -s "$PROMETHEUS_URL/api/v1/query?query=up{job=\"prometheus\"}" | \
+  jq -r '.data.result[] | "\(.metric.instance): \(.value[1])"' >> $REPORT_FILE
+
+# Memory usage
+echo -e "\nMemory Usage (GB):" >> $REPORT_FILE
+curl -s "$PROMETHEUS_URL/api/v1/query?query=process_resident_memory_bytes{job=\"prometheus\"}/1024/1024/1024" | \
+  jq -r '.data.result[] | "\(.metric.instance): \(.value[1])"' >> $REPORT_FILE
+
+# Active series
+echo -e "\nActive Series:" >> $REPORT_FILE
+curl -s "$PROMETHEUS_URL/api/v1/query?query=prometheus_tsdb_head_series" | \
+  jq -r '.data.result[] | "\(.metric.instance): \(.value[1])"' >> $REPORT_FILE
+
+# Query performance
+echo -e "\nQuery Performance (95th percentile, seconds):" >> $REPORT_FILE
+curl -s "$PROMETHEUS_URL/api/v1/query?query=histogram_quantile(0.95, rate(prometheus_engine_query_duration_seconds_bucket[24h]))" | \
+  jq -r '.data.result[] | "\(.metric.instance): \(.value[1])"' >> $REPORT_FILE
+
+# Failed scrapes
+echo -e "\nFailed Scrapes:" >> $REPORT_FILE
+curl -s "$PROMETHEUS_URL/api/v1/query?query=count by (job) (up == 0)" | \
+  jq -r '.data.result[] | "\(.metric.job): \(.value[1])"' >> $REPORT_FILE
+
+echo "Report generated: $REPORT_FILE"
+```
+
+### Capacity Planning Script
+
+```bash
+#!/bin/bash
+# capacity-planning.sh
+
+PROMETHEUS_URL="http://localhost:9090"
+
+echo "Prometheus Capacity Planning Report"
+echo "=================================="
+
+# Current metrics
+CURRENT_SERIES=$(curl -s "$PROMETHEUS_URL/api/v1/query?query=prometheus_tsdb_head_series" | jq '.data.result[0].value[1] | tonumber')
+CURRENT_MEMORY=$(curl -s "$PROMETHEUS_URL/api/v1/query?query=process_resident_memory_bytes{job=\"prometheus\"}" | jq '.data.result[0].value[1] | tonumber')
+INGESTION_RATE=$(curl -s "$PROMETHEUS_URL/api/v1/query?query=rate(prometheus_tsdb_head_samples_appended_total[1h])" | jq '.data.result[0].value[1] | tonumber')
+
+echo "Current active series: $CURRENT_SERIES"
+echo "Current memory usage: $(echo "$CURRENT_MEMORY / 1024 / 1024 / 1024" | bc) GB"
+echo "Current ingestion rate: $(echo "$INGESTION_RATE" | bc) samples/sec"
+
+# Projected growth (30 days)
+PROJECTED_SERIES=$(echo "$CURRENT_SERIES * 1.1" | bc)  # 10% growth
+PROJECTED_MEMORY=$(echo "$CURRENT_MEMORY * 1.1" | bc)
+
+echo -e "\nProjected in 30 days (10% growth):"
+echo "Projected series: $PROJECTED_SERIES"
+echo "Projected memory: $(echo "$PROJECTED_MEMORY / 1024 / 1024 / 1024" | bc) GB"
+
+# Recommendations
+if (( $(echo "$CURRENT_SERIES > 500000" | bc -l) )); then
+    echo -e "\nRecommendation: Consider horizontal scaling or series optimization"
+fi
+
+if (( $(echo "$CURRENT_MEMORY > 8589934592" | bc -l) )); then  # 8GB
+    echo -e "\nRecommendation: Monitor memory usage closely, consider memory optimization"
+fi
+```
+
+## Log Analysis
+
+### Important Log Patterns
+
+```bash
+# Monitor Prometheus logs for issues
+tail -f /var/log/prometheus/prometheus.log | grep -E "(error|warn|panic|fatal)"
+
+# Common error patterns to watch for:
+# - "out of memory"
+# - "too many open files"
+# - "context deadline exceeded"
+# - "compaction failed"
+# - "WAL corruption"
+```
+
+### Log Aggregation Query (if using Loki)
+
+```logql
+# Prometheus error analysis
+{job="prometheus"} |= "error" | json | line_format "{{ .level }}: {{ .msg }}"
+
+# Memory pressure indicators
+{job="prometheus"} |~ "memory|OOM|out of memory"
+
+# Query performance issues
+{job="prometheus"} |~ "slow|timeout|deadline exceeded"
+```
+
+## Troubleshooting Playbook
+
+### High Memory Usage
+
+1. **Check active series**: `prometheus_tsdb_head_series`
+2. **Identify high-cardinality metrics**: Use cardinality analysis queries
+3. **Review scrape configurations**: Look for unnecessary labels
+4. **Consider series dropping**: Use `metric_relabel_configs`
+
+### Slow Queries
+
+1. **Enable query logging**: `--query.log_file` flag
+2. **Analyze query patterns**: Review most expensive queries
+3. **Optimize query structure**: Use recording rules for complex queries
+4. **Increase query timeout**: `--query.timeout` if appropriate
+
+### Storage Issues
+
+1. **Check disk space**: Monitor filesystem usage
+2. **Review retention settings**: Adjust retention time/size
+3. **Monitor compaction**: Check for failed compactions
+4. **WAL monitoring**: Watch WAL size growth
+
+## Integration with External Monitoring
+
+### Exporting Metrics to Another Prometheus
+
+```yaml
+# Remote write configuration for meta-monitoring
+remote_write:
+  - url: "http://meta-prometheus:9090/api/v1/write"
+    queue_config:
+      capacity: 10000
+      max_samples_per_send: 1000
+    write_relabel_configs:
+      - source_labels: [__name__]
+        regex: "prometheus_.*"
+        action: keep
+```
+
+### Alertmanager Integration
+
+```yaml
+# Alertmanager configuration for Prometheus alerts
+route:
+  group_by: ['alertname', 'instance']
+  group_wait: 10s
+  group_interval: 10s
+  repeat_interval: 1h
+  receiver: 'prometheus-alerts'
+  routes:
+  - match:
+      severity: critical
+    receiver: 'prometheus-critical'
+
+receivers:
+- name: 'prometheus-alerts'
+  slack_configs:
+  - api_url: 'YOUR_SLACK_WEBHOOK'
+    channel: '#prometheus-alerts'
+    
+- name: 'prometheus-critical'
+  pagerduty_configs:
+  - service_key: 'YOUR_PAGERDUTY_KEY'
+```
+
+---
+
+This monitoring setup ensures your Prometheus infrastructure remains healthy and performant. Regular monitoring of these metrics and alerts will help you maintain reliable monitoring for your production environments. 
\ No newline at end of file
diff --git a/docs/operating/production-deployment.md b/docs/operating/production-deployment.md
new file mode 100644
index 000000000..78339ff2c
--- /dev/null
+++ b/docs/operating/production-deployment.md
@@ -0,0 +1,497 @@
+---
+title: Production Deployment Guide
+sort_rank: 1
+---
+
+# Production Deployment Guide
+
+This guide provides comprehensive recommendations for deploying Prometheus in production environments. It covers hardware requirements, high availability patterns, configuration best practices, and operational considerations for running Prometheus at scale.
+
+## Hardware and Infrastructure Requirements
+
+### Server Specifications
+
+**Memory Requirements**
+- **Minimum**: 4GB RAM for small deployments (< 10k active series)
+- **Recommended**: 16-32GB RAM for medium deployments (10k-100k active series)
+- **Large Scale**: 64GB+ RAM for large deployments (100k+ active series)
+
+**CPU Requirements**
+- **Minimum**: 2 CPU cores
+- **Recommended**: 4-8 CPU cores for most production workloads
+- **Large Scale**: 16+ CPU cores for high-cardinality environments
+
+**Storage Requirements**
+- **SSD strongly recommended** for data directory
+- **Disk space calculation**: `retention_days * daily_ingestion_rate * compression_ratio`
+  - Typical compression ratio: 1.5-3x
+  - Example: 30 days * 1GB/day * 2 = 60GB storage needed
+- **Separate disk** for WAL (Write-Ahead Log) recommended for high-throughput deployments
+
+### Network Considerations
+
+```yaml
+# Recommended firewall rules
+ingress:
+  - port: 9090    # Prometheus web UI and API
+    protocol: TCP
+    sources: ["monitoring-subnet", "admin-subnet"]
+  
+  - port: 9091    # Pushgateway (if used)
+    protocol: TCP
+    sources: ["application-subnets"]
+
+egress:
+  - port: 80/443  # Scraping HTTP/HTTPS targets
+    protocol: TCP
+    destinations: ["0.0.0.0/0"]
+  
+  - port: 9100    # Node exporter
+    protocol: TCP
+    destinations: ["infrastructure-subnets"]
+```
+
+## High Availability Deployment Patterns
+
+### Active-Active Configuration
+
+Deploy multiple identical Prometheus instances scraping the same targets:
+
+```yaml
+# prometheus-1.yml
+global:
+  scrape_interval: 15s
+  evaluation_interval: 15s
+  external_labels:
+    replica: 'prometheus-1'
+    cluster: 'production'
+
+scrape_configs:
+  - job_name: 'application-servers'
+    static_configs:
+      - targets: ['app1:8080', 'app2:8080', 'app3:8080']
+```
+
+```yaml
+# prometheus-2.yml  
+global:
+  scrape_interval: 15s
+  evaluation_interval: 15s
+  external_labels:
+    replica: 'prometheus-2'
+    cluster: 'production'
+
+scrape_configs:
+  - job_name: 'application-servers'
+    static_configs:
+      - targets: ['app1:8080', 'app2:8080', 'app3:8080']
+```
+
+**Benefits:**
+- No single point of failure
+- Load distribution for queries
+- Natural data redundancy
+
+**Considerations:**
+- Requires deduplication in query layer (Thanos, Cortex, or VictoriaMetrics)
+- Double storage requirements
+- Alert rule evaluation happens on both instances
+
+### Federation for Hierarchical Scaling
+
+```yaml
+# Global Prometheus configuration
+scrape_configs:
+  - job_name: 'prometheus-federation'
+    scrape_interval: 15s
+    honor_labels: true
+    metrics_path: '/federate'
+    params:
+      'match[]':
+        - '{job=~"prometheus|node|kubernetes-.*"}'
+        - 'up'
+        - 'prometheus_build_info'
+    static_configs:
+      - targets:
+        - 'prometheus-region-us-east:9090'
+        - 'prometheus-region-us-west:9090'
+        - 'prometheus-region-eu:9090'
+```
+
+## Production Configuration Best Practices
+
+### Storage Configuration
+
+```yaml
+# Command line flags for storage optimization
+--storage.tsdb.path=/prometheus/data
+--storage.tsdb.retention.time=30d
+--storage.tsdb.retention.size=100GB
+--storage.tsdb.wal-compression
+--storage.tsdb.no-lockfile
+--web.enable-lifecycle
+--web.enable-admin-api
+```
+
+### Memory Optimization
+
+```yaml
+# Limit memory usage and optimize for large deployments
+--storage.tsdb.head-chunks-write-queue-size=10000
+--query.max-concurrency=20
+--query.timeout=2m
+--query.max-samples=50000000
+```
+
+### Sample Configuration File
+
+```yaml
+# /etc/prometheus/prometheus.yml
+global:
+  scrape_interval: 30s
+  scrape_timeout: 10s
+  evaluation_interval: 30s
+  external_labels:
+    environment: 'production'
+    datacenter: 'us-east-1'
+
+rule_files:
+  - "/etc/prometheus/rules/*.yml"
+
+alerting:
+  alertmanagers:
+    - static_configs:
+        - targets:
+          - alertmanager-1:9093
+          - alertmanager-2:9093
+      timeout: 10s
+
+scrape_configs:
+  # Prometheus itself
+  - job_name: 'prometheus'
+    static_configs:
+      - targets: ['localhost:9090']
+    scrape_interval: 30s
+    metrics_path: /metrics
+
+  # Node exporter for system metrics
+  - job_name: 'node-exporter'
+    static_configs:
+      - targets: 
+          - 'node1:9100'
+          - 'node2:9100'
+          - 'node3:9100'
+    scrape_interval: 30s
+
+  # Application metrics
+  - job_name: 'application'
+    static_configs:
+      - targets:
+          - 'app1:8080'
+          - 'app2:8080'
+    scrape_interval: 15s
+    metrics_path: /metrics
+    scrape_timeout: 10s
+
+# Remote write for long-term storage (optional)
+remote_write:
+  - url: "https://remote-storage-endpoint/api/v1/write"
+    queue_config:
+      capacity: 2500
+      max_shards: 200
+      min_shards: 1
+      max_samples_per_send: 500
+      batch_send_deadline: 5s
+```
+
+## Container Deployment
+
+### **Official Deployment Examples**
+
+For production-ready deployment configurations, we recommend using the official examples that are maintained and tested:
+
+**📁 Prometheus Examples Repository**
+- **Location**: [prometheus/prometheus/documentation/examples](https://github.com/prometheus/prometheus/tree/main/documentation/examples)
+- **Maintained**: Versioned with Prometheus releases
+- **Tested**: Validated configurations for various deployment scenarios
+
+### **Docker Configuration**
+
+**📋 Basic Docker Setup Example**
+
+```dockerfile
+# Example Dockerfile for production Prometheus
+FROM prom/prometheus:latest
+
+# Copy configuration
+COPY prometheus.yml /etc/prometheus/
+COPY rules/ /etc/prometheus/rules/
+
+# Set proper ownership
+USER root
+RUN chown -R prometheus:prometheus /etc/prometheus/
+USER prometheus
+
+# Expose metrics port
+EXPOSE 9090
+
+# Use proper entrypoint with production flags
+ENTRYPOINT ["/bin/prometheus", \
+  "--config.file=/etc/prometheus/prometheus.yml", \
+  "--storage.tsdb.path=/prometheus", \
+  "--storage.tsdb.retention.time=30d", \
+  "--storage.tsdb.wal-compression", \
+  "--web.console.libraries=/etc/prometheus/console_libraries", \
+  "--web.console.templates=/etc/prometheus/consoles", \
+  "--web.enable-lifecycle", \
+  "--web.external-url=https://prometheus.company.com"]
+```
+
+### **Kubernetes Deployment**
+
+**📋 Recommended Approach**: Use official Helm charts or kustomize examples
+
+**Official Resources**:
+- **Prometheus Community Helm Chart**: [prometheus-community/helm-charts](https://github.com/prometheus-community/helm-charts/tree/main/charts/prometheus)
+- **Prometheus Operator**: [prometheus-operator/prometheus-operator](https://github.com/prometheus-operator/prometheus-operator)
+- **Official Examples**: [prometheus/prometheus examples](https://github.com/prometheus/prometheus/tree/main/documentation/examples)
+
+**📝 Key Kubernetes Considerations**:
+- Use StatefulSets for data persistence
+- Configure proper resource requests and limits
+- Set up horizontal pod autoscaling carefully
+- Use persistent volumes for data storage
+- Configure proper security contexts
+- Set up monitoring and alerting for the Kubernetes deployment itself
+
+**Example Resource Requirements**:
+```yaml
+# Example resource configuration - adjust for your needs
+resources:
+  requests:
+    memory: "2Gi"
+    cpu: "500m"
+  limits:
+    memory: "4Gi"
+    cpu: "2"
+```
+
+### **High Availability with Helm**
+
+For production HA deployments, consider the prometheus-community Helm chart with these key configurations:
+
+```bash
+# Example Helm installation with HA configuration
+helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
+helm repo update
+
+# Install with custom values for HA
+helm install prometheus prometheus-community/prometheus \
+  --set server.replicaCount=2 \
+  --set server.persistentVolume.size=100Gi \
+  --set server.retention=30d \
+  --namespace monitoring \
+  --create-namespace
+```
+
+**📋 Important**: Always customize the values.yaml file for your specific requirements. See the [official chart documentation](https://github.com/prometheus-community/helm-charts/tree/main/charts/prometheus) for all available options.
+
+## Security Hardening
+
+### Authentication and Authorization
+
+```yaml
+# Basic auth configuration
+basic_auth_users:
+  admin: $2a$10$hYoOolb6tZyZQkEJ8T8jIuJ6U.4FK/8e8cDatYQ8F5U0QKa.4QKyC  # admin
+  readonly: $2a$10$ZoOJlGqEEzOz5T8uFX5c8elZeT3cxBE8XuqD8qJ2z9F5x8c4U6Ty6   # readonly
+
+# TLS configuration
+tls_server_config:
+  cert_file: /etc/prometheus/tls/server.crt
+  key_file: /etc/prometheus/tls/server.key
+  client_ca_file: /etc/prometheus/tls/ca.crt
+  client_auth_type: RequireAndVerifyClientCert
+```
+
+### Network Security
+
+```bash
+# Firewall rules using iptables
+# Allow Prometheus web interface from monitoring subnet only
+iptables -A INPUT -p tcp --dport 9090 -s 10.0.1.0/24 -j ACCEPT
+iptables -A INPUT -p tcp --dport 9090 -j DROP
+
+# Allow scraping from Prometheus to targets
+iptables -A OUTPUT -p tcp --dport 9100 -d 10.0.0.0/16 -j ACCEPT
+iptables -A OUTPUT -p tcp --dport 8080 -d 10.0.0.0/16 -j ACCEPT
+```
+
+## Monitoring Prometheus Performance
+
+Essential metrics to monitor for Prometheus health:
+
+```promql
+# Memory usage
+prometheus_tsdb_head_samples_appended_total
+prometheus_engine_query_duration_seconds
+prometheus_tsdb_symbol_table_size_bytes
+
+# Storage metrics
+prometheus_tsdb_blocks_loaded
+prometheus_tsdb_compactions_total
+prometheus_tsdb_head_series
+
+# Query performance
+prometheus_query_duration_seconds
+prometheus_engine_queries_concurrent_max
+```
+
+## Backup and Disaster Recovery
+
+### Snapshot-based Backup
+
+```bash
+#!/bin/bash
+# backup-prometheus.sh
+
+PROMETHEUS_URL="http://localhost:9090"
+BACKUP_DIR="/backup/prometheus"
+DATE=$(date +%Y%m%d_%H%M%S)
+
+# Create snapshot
+curl -XPOST $PROMETHEUS_URL/api/v1/admin/tsdb/snapshot
+
+# Get snapshot name
+SNAPSHOT=$(ls -t /prometheus/snapshots/ | head -1)
+
+# Copy snapshot to backup location
+mkdir -p $BACKUP_DIR/$DATE
+cp -r /prometheus/snapshots/$SNAPSHOT $BACKUP_DIR/$DATE/
+
+# Compress backup
+tar -czf $BACKUP_DIR/prometheus_backup_$DATE.tar.gz -C $BACKUP_DIR/$DATE .
+
+# Clean up old backups (keep 30 days)
+find $BACKUP_DIR -name "*.tar.gz" -mtime +30 -delete
+
+echo "Backup completed: $BACKUP_DIR/prometheus_backup_$DATE.tar.gz"
+```
+
+### Recovery Procedure
+
+```bash
+#!/bin/bash
+# restore-prometheus.sh
+
+BACKUP_FILE="$1"
+PROMETHEUS_DATA_DIR="/prometheus"
+
+if [ -z "$BACKUP_FILE" ]; then
+    echo "Usage: $0 <backup_file>"
+    exit 1
+fi
+
+# Stop Prometheus
+systemctl stop prometheus
+
+# Backup current data
+mv $PROMETHEUS_DATA_DIR $PROMETHEUS_DATA_DIR.backup.$(date +%s)
+
+# Extract backup
+mkdir -p $PROMETHEUS_DATA_DIR
+tar -xzf $BACKUP_FILE -C $PROMETHEUS_DATA_DIR
+
+# Set proper permissions
+chown -R prometheus:prometheus $PROMETHEUS_DATA_DIR
+
+# Start Prometheus
+systemctl start prometheus
+
+echo "Recovery completed from $BACKUP_FILE"
+```
+
+## Performance Tuning
+
+### Memory Optimization
+
+```bash
+# JVM-style memory flags for Go garbage collection
+export GOGC=100          # Default garbage collection target
+export GOMEMLIMIT=8GiB   # Set memory limit (Go 1.19+)
+
+# Start Prometheus with memory optimizations
+prometheus \
+  --storage.tsdb.head-chunks-write-queue-size=10000 \
+  --query.max-concurrency=20 \
+  --storage.tsdb.min-block-duration=2h \
+  --storage.tsdb.max-block-duration=2h
+```
+
+### Storage Optimization
+
+```yaml
+# Reduce cardinality by dropping unnecessary labels
+metric_relabel_configs:
+  - source_labels: [__name__]
+    regex: 'go_.*'
+    action: drop
+  - source_labels: [instance]
+    regex: '(.*):[0-9]+'
+    target_label: instance
+    replacement: '${1}'
+```
+
+## Troubleshooting Common Issues
+
+### High Memory Usage
+
+```promql
+# Check for high cardinality series
+topk(10, count by (__name__)({__name__=~".+"}))
+
+# Identify sources of cardinality
+prometheus_tsdb_symbol_table_size_bytes
+prometheus_tsdb_head_series
+```
+
+### Slow Queries
+
+```promql
+# Monitor query performance
+rate(prometheus_engine_query_duration_seconds_sum[5m]) / 
+rate(prometheus_engine_query_duration_seconds_count[5m])
+
+# Check for expensive queries
+prometheus_engine_queries_concurrent_max
+```
+
+### Storage Issues
+
+```bash
+# Check disk space
+df -h /prometheus
+
+# Monitor WAL size
+du -sh /prometheus/wal/
+
+# Check for corrupted blocks
+prometheus_tsdb_blocks_loaded vs expected blocks
+```
+
+## Next Steps
+
+After deploying Prometheus in production:
+
+1. Set up [monitoring of Prometheus itself](monitoring-prometheus/)
+2. Configure [alerting rules](../practices/alerting.md)
+3. Implement [backup procedures](backup-recovery/)
+4. Review [security configurations](security.md)
+5. Plan for [scaling and performance tuning](performance-tuning/)
+
+---
+
+**Additional Resources:**
+- [Prometheus Configuration Reference](https://prometheus.io/docs/prometheus/latest/configuration/configuration/)
+- [Storage Documentation](https://prometheus.io/docs/prometheus/latest/storage/)
+- [Best Practices](../practices/) 
\ No newline at end of file