cortexproject
diff --git a/‎README.md‎
Lines changed: 1 addition & 1 deletion b/‎README.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/_index.md‎
Lines changed: 2 additions & 2 deletions b/‎docs/_index.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎docs/architecture.md‎
Lines changed: 8 additions & 8 deletions b/‎docs/architecture.md‎
Lines changed: 8 additions & 8 deletions
diff --git a/‎docs/getting-started/_index.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/getting-started/_index.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/guides/capacity-planning.md‎
Lines changed: 57 additions & 0 deletions b/‎docs/guides/capacity-planning.md‎
Lines changed: 57 additions & 0 deletions
diff --git a/‎docs/guides/ingester-handover.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/guides/ingester-handover.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/guides/kubernetes.md‎
Lines changed: 90 additions & 0 deletions b/‎docs/guides/kubernetes.md‎
Lines changed: 90 additions & 0 deletions
@@ -25,7 +25,7 @@ Read the [getting started guide](https://cortexmetrics.io/docs/getting-started)
 project. Before deploying Cortex with a permanent storage backend you
 should read:
 1. [An overview of Cortex's architecture](https://cortexmetrics.io/docs/architecture/)
-1. [A general guide to running Cortex](https://cortexmetrics.io/docs/guides/running-in-production/)
+1. [A guide to running Cortex](https://cortexmetrics.io/docs/guides/production/)
 1. [Information regarding configuring Cortex](https://cortexmetrics.io/docs/configuration/arguments/)
 1. [Steps to run Cortex with Cassandra](https://cortexmetrics.io/docs/guides/cassandra/)
 
 
@@ -33,8 +33,8 @@ project. Before deploying Cortex with a permanent storage backend you
 should read:
 
 1. [An overview of Cortex's architecture](architecture.md)
-1. [A general guide to running Cortex](running.md)
-1. [Information regarding configuring Cortex](arguments.md)
+1. [A guide to running Cortex](production/running.md)
+1. [Information regarding configuring Cortex](configuration/arguments.md)
 
 For a guide to contributing to Cortex, see the [contributor guidelines](contributing/).
 
 
@@ -1,7 +1,7 @@
 ---
 title: "Cortex Architecture"
 linkTitle: "Architecture"
-weight: 4
+weight: 2
 slug: architecture
 ---
 
@@ -48,7 +48,7 @@ Internally, the access to the chunks storage relies on a unified interface calle
 
 The chunk and index format are versioned, this allows Cortex operators to upgrade the cluster to take advantage of new features and improvements. This strategy enables changes in the storage format without requiring any downtime or complex procedures to rewrite the stored data. A set of schemas are used to map the version while reading and writing time series belonging to a specific period of time.
 
-The current schema recommendation is the **v10 schema** (v11 is still experimental). For more information about the schema, please check out the [Schema](guides/running.md#schema) documentation.
+The current schema recommendation is the **v10 schema** (v11 is still experimental). For more information about the schema, please check out the [Schema](configuration/schema-config-reference.md) documentation.
 
 ### Blocks storage (experimental)
 
@@ -107,7 +107,7 @@ The supported KV stores for the HA tracker are:
 * [Consul](https://www.consul.io)
 * [Etcd](https://etcd.io)
 
-For more information, please refer to [config for sending HA pairs data to Cortex](guides/ha-pair-handling.md) in the documentation.
+For more information, please refer to [config for sending HA pairs data to Cortex](production/ha-pair-handling.md) in the documentation.
 
 #### Hashing
 
@@ -223,11 +223,11 @@ The query frontend supports caching query results and reuses them on subsequent
 
 The **ruler** is an **optional service** executing PromQL queries for recording rules and alerts. The ruler requires a database storing the recording rules and alerts for each tenant.
 
-Ruler is **semi-stateful** and can be scaled horizontally. 
-Running rules internally have state, as well as the ring the rulers initiate. 
-However, if the rulers all fail and restart, 
-Prometheus alert rules have a feature where an alert is restored and returned to a firing state 
-if it would have been active in its for period. 
+Ruler is **semi-stateful** and can be scaled horizontally.
+Running rules internally have state, as well as the ring the rulers initiate.
+However, if the rulers all fail and restart,
+Prometheus alert rules have a feature where an alert is restored and returned to a firing state
+if it would have been active in its for period.
 However, there would be gaps in the series generated by the recording rules.
 
 ### Alertmanager
 
@@ -1,6 +1,6 @@
 ---
 title: "Getting Started"
 linkTitle: "Getting Started"
-weight: 3
+weight: 1
 menu:
 ---
@@ -0,0 +1,57 @@
+---
+title: "Capacity Planning"
+linkTitle: "Capacity Planning"
+weight: 104
+slug: capacity-planning
+---
+
+
+You will want to estimate how many nodes are required, how many of
+each component to run, and how much storage space will be required.
+In practice, these will vary greatly depending on the metrics being
+sent to Cortex.
+
+Some key parameters are:
+
+ 1. The number of active series. If you have Prometheus already you
+ can query `prometheus_tsdb_head_series` to see this number.
+ 2. Sampling rate, e.g. a new sample for each series every minute
+ (the default Prometheus [scrape_interval](https://prometheus.io/docs/prometheus/latest/configuration/configuration/)).
+ Multiply this by the number of active series to get the
+ total rate at which samples will arrive at Cortex.
+ 3. The rate at which series are added and removed. This can be very
+ high if you monitor objects that come and go - for example if you run
+ thousands of batch jobs lasting a minute or so and capture metrics
+ with a unique ID for each one. [Read how to analyse this on
+ Prometheus](https://www.robustperception.io/using-tsdb-analyze-to-investigate-churn-and-cardinality).
+ 4. How compressible the time-series data are. If a metric stays at
+ the same value constantly, then Cortex can compress it very well, so
+ 12 hours of data sampled every 15 seconds would be around 2KB.  On
+ the other hand if the value jumps around a lot it might take 10KB.
+ There are not currently any tools available to analyse this.
+ 5. How long you want to retain data for, e.g. 1 month or 2 years.
+
+Other parameters which can become important if you have particularly
+high values:
+
+ 6. Number of different series under one metric name.
+ 7. Number of labels per series.
+ 8. Rate and complexity of queries.
+
+Now, some rules of thumb:
+
+ 1. Each million series in an ingester takes 15GB of RAM. Total number
+ of series in ingesters is number of active series times the
+ replication factor. This is with the default of 12-hour chunks - RAM
+ required will reduce if you set `-ingester.max-chunk-age` lower
+ (trading off more back-end database IO)
+ 2. Each million series (including churn) consumes 15GB of chunk
+ storage and 4GB of index, per day (so multiply by the retention
+ period).
+ 3. Each 100,000 samples/sec arriving takes 1 CPU in distributors.
+ Distributors don't need much RAM.
+
+If you turn on compression between distributors and ingesters (for
+example to save on inter-zone bandwidth charges at AWS/GCP) they will use
+significantly more CPU (approx 100% more for distributor and 50% more
+for ingester).
@@ -1,7 +1,7 @@
 ---
 title: "Ingester Hand-over"
 linkTitle: "Ingester Hand-over"
-weight: 5
+weight: 102
 slug: ingester-handover
 ---
 
 
@@ -0,0 +1,90 @@
+---
+title: "Running Cortex on Kubernetes"
+linkTitle: "Running Cortex on Kubernetes"
+weight: 100
+slug: kubernetes
+---
+
+Because Cortex is designed to run multiple instances of each component
+(ingester, querier, etc.), you probably want to automate the placement
+and shepherding of these instances. Most users choose Kubernetes to do
+this, but this is not mandatory.
+
+## Configuration
+
+### Resource requests
+
+If using Kubernetes, each container should specify resource requests
+so that the scheduler can place them on a node with sufficient capacity.
+
+For example an ingester might request:
+
+```
+        resources:
+          requests:
+            cpu: 4
+            memory: 10Gi
+```
+
+The specific values here should be adjusted based on your own
+experiences running Cortex - they are very dependent on rate of data
+arriving and other factors such as series churn.
+
+### Take extra care with ingesters
+
+Ingesters hold hours of timeseries data in memory; you can configure
+Cortex to replicate the data but you should take steps to avoid losing
+all replicas at once:
+
+ - Don't run multiple ingesters on the same node.
+ - Don't run ingesters on preemptible/spot nodes.
+ - Spread out ingesters across racks / availability zones / whatever
+   applies in your datacenters.
+
+You can ask Kubernetes to avoid running on the same node like this:
+
+```
+      affinity:
+        podAntiAffinity:
+          preferredDuringSchedulingIgnoredDuringExecution:
+          - weight: 100
+            podAffinityTerm:
+              labelSelector:
+                matchExpressions:
+                - key: name
+                  operator: In
+                  values:
+                  - ingester
+              topologyKey: "kubernetes.io/hostname"
+```
+
+Give plenty of time for an ingester to hand over or flush data to
+store when shutting down; for Kubernetes this looks like:
+
+```
+      terminationGracePeriodSeconds: 2400
+```
+
+Ask Kubernetes to limit rolling updates to one ingester at a time, and
+signal the old one to stop before the new one is ready:
+
+```
+  strategy:
+    rollingUpdate:
+      maxSurge: 0
+      maxUnavailable: 1
+```
+
+Ingesters provide an HTTP hook to signal readiness when all is well;
+this is valuable because it stops a rolling update at the first
+problem:
+
+```
+        readinessProbe:
+          httpGet:
+            path: /ready
+            port: 80
+```
+
+We do not recommend configuring a liveness probe on ingesters -
+killing them is a last resort and should not be left to a machine.