Skip to content

Commit df0ae01

Browse files
committed
more wordsmithing
1 parent f4388c6 commit df0ae01

File tree

3 files changed

+79
-66
lines changed

3 files changed

+79
-66
lines changed

arch.rst

Lines changed: 16 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -42,8 +42,8 @@ Platform-as-a-Service (PaaS).
4242

4343
Aether supports this combination by implementing both the RAN and the
4444
user plane of the Mobile Core on-prem, as cloud-native workloads
45-
co-located on the Aether cluster. This is often referred to as local
46-
breakout because it enables direct communication between mobile
45+
co-located on the Aether cluster. This is often referred to as *local
46+
breakout* because it enables direct communication between mobile
4747
devices and edge applications without data traffic leaving the
4848
enterprise. This scenario is depicted in :numref:`Figure %s
4949
<fig-hybrid>`, which does not name the edge applications, but
@@ -62,7 +62,7 @@ example.
6262

6363
The approach includes both edge (on-prem) and centralized (off-prem)
6464
components. This is true for edge apps, which often have a centralized
65-
counterpart running in a commodity cloud. It is also true for the
65+
counterpart running in a commodity cloud. It is also true for the 5G
6666
Mobile Core, where the on-prem User Plane (UP) is paired with a
6767
centralized Control Plane (CP). The central cloud shown in this figure
6868
might be private (i.e., operated by the enterprise), public (i.e.,
@@ -72,9 +72,9 @@ cloud). Also shown in :numref:`Figure %s <fig-hybrid>` is a
7272
centralized *Control and Management Platform*. This represents all the
7373
functionality needed to offer Aether as a managed service, with system
7474
administrators using a portal exported by this platform to operate the
75-
underlying infrastructure and services. The rest of this book is about
76-
everything that goes into implementing that *Control and Management
77-
Platform*.
75+
underlying infrastructure and services within their enterprise. The
76+
rest of this book is about everything that goes into implementing that
77+
*Control and Management Platform*.
7878

7979
2.1 Edge Cloud
8080
--------------
@@ -112,8 +112,8 @@ the SD-Fabric), are deployed as a set of microservices, but details
112112
about the functionality implemented by these containers is otherwise
113113
not critical to this discussion. For our purposes, they are
114114
representative of any cloud native workload. (The interested reader is
115-
referred to our 5G and SDN books for more information about the
116-
internal working of SD-RAN, SD-Core, and SD-Fabric.)
115+
referred to our companion 5G and SDN books for more information about
116+
the internal working of SD-RAN, SD-Core, and SD-Fabric.)
117117

118118
.. _reading_5g:
119119
.. admonition:: Further Reading
@@ -151,8 +151,8 @@ Platform (AMP).
151151
Each SD-Core CP controls one or more SD-Core UPs, as specified by
152152
3GPP, the standards organization responsible for 5G. Exactly how CP
153153
instances (running centrally) are paired with UP instances (running at
154-
the edges) is a configuration-time decision, and depends on the degree
155-
of isolation the enterprise sites require. AMP is responsible for
154+
the edges) is a runtime decision, and depends on the degree of
155+
isolation the enterprise sites require. AMP is responsible for
156156
managing all the centralized and edge subsystems (as introduced in the
157157
next section).
158158

@@ -173,12 +173,12 @@ we started with in :numref:`Figure %s <fig-hw>` of Chapter 1).\ [#]_
173173
This is because, while each ACE site usually corresponds to a physical
174174
cluster built out of bare-metal components, each of the SD-Core CP
175175
subsystems shown in :numref:`Figure %s <fig-aether>` is actually
176-
deployed as a logical Kubernetes cluster on a commodity cloud. The
176+
deployed in a logical Kubernetes cluster on a commodity cloud. The
177177
same is true for AMP. Aether’s centralized components are able to run
178178
in Google Cloud Platform, Microsoft Azure, and Amazon’s AWS. They also
179179
run as an emulated cluster implemented by a system like
180180
KIND—Kubernetes in Docker—making it possible for developers to run
181-
these components on a laptop.
181+
these components on their laptop.
182182

183183
.. [#] Confusingly, Kubernetes adopts generic terminology, such as
184184
“cluster” and “service”, and gives it very specific meaning. In
@@ -190,8 +190,7 @@ these components on a laptop.
190190
potentially thousands of such logical clusters. And as we'll
191191
see in a later chapter, even an ACE edge site sometimes hosts
192192
more than one Kubernetes cluster (e.g., one running production
193-
services and one used for development and testing of new
194-
services).
193+
services and one used for trial deployments of new services).
195194
196195
2.3 Control and Management
197196
--------------------------
@@ -304,7 +303,7 @@ both physical and virtual resources.
304303
2.3.2 Lifecycle Management
305304
~~~~~~~~~~~~~~~~~~~~~~~~~~
306305

307-
Lifecycle Management is the process of integrating fixed, extended,
306+
Lifecycle Management is the process of integrating debugged, extended,
308307
and refactored components (often microservices) into a set of
309308
artifacts (e.g., Docker containers and Helm charts), and subsequently
310309
deploying those artifacts to the operational cloud. It includes a
@@ -368,7 +367,7 @@ the cloud offers to end users. Thus, we can generalize the figure so
368367
Runtime Control mediates access to any of the underlying microservices
369368
(or collections of microservices) the cloud designer wishes to make
370369
publicly accessible, including the rest of AMP! In effect, Runtime
371-
Control implements an abstraction layer, codified with programmatic
370+
Control implements an abstraction layer, codified with a programmatic
372371
API.
373372

374373
Given this mediation role, Runtime Control provides mechanisms to
@@ -434,7 +433,7 @@ operators a way to both read (monitor) and write (control) various
434433
parameters of a running system. Connecting those two subsystems is how
435434
we build closed loop control.
436435

437-
A third example is even more ambiguous. Lifecycle management usually
436+
A third example is even more nebulous. Lifecycle management usually
438437
takes responsibility for *configuring* each component, while runtime
439438
control takes responsibility for *controlling* each component. Where
440439
you draw the line between configuration and control is somewhat

intro.rst

Lines changed: 48 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,12 @@ perspective on the problem. We return to the confluence of enterprise,
7272
cloud, access technologies later in this chapter, but we start by
7373
addressing the terminology challenge.
7474

75+
.. _reading_aether:
76+
.. admonition:: Further Reading
77+
78+
`Aether: 5G-Connected Edge Cloud
79+
<https://opennetworking.org/aether/>`__.
80+
7581
1.1 Terminology
7682
---------------
7783

@@ -107,7 +113,7 @@ terminology.
107113
* **OSS/BSS:** Another Telco acronym (Operations Support System,
108114
Business Support System), referring to the subsystem that
109115
implements both operational logic (OSS) and business logic
110-
(BSS). Usually the top-most component in the overall O&M
116+
(BSS). It is usually the top-most component in the overall O&M
111117
hierarchy.
112118

113119
* **EMS:** Yet another Telco acronym (Element Management System),
@@ -164,34 +170,34 @@ terminology.
164170
* **Continuous Integration / Continuous Deployment (CI/CD):** An
165171
approach to Lifecycle Management in which the path from
166172
development (producing new functionality) to testing, integration,
167-
and ultimately deployment is an automated pipeline. Typically
168-
implies continuously making small incremental changes rather than
169-
performing large disruptive upgrades.
173+
and ultimately deployment is an automated pipeline. CI/CD
174+
typically implies continuously making small incremental changes
175+
rather than performing large disruptive upgrades.
170176

171177
* **DevOps:** An engineering discipline (usually implied by CI/CD)
172178
that balances feature velocity against system stability. It is a
173179
practice typically associated with container-based (also known as
174-
*cloud native*) systems, and typified by *Site Reliability
180+
*cloud native*) systems, as typified by *Site Reliability
175181
Engineering (SRE)* practiced by cloud providers like Google.
176182

177183
* **In-Service Software Upgrade (ISSU):** A requirement that a
178184
component continue running during the deployment of an upgrade,
179185
with minimal disruption to the service delivered to
180-
end-users. Generally implies the ability to incrementally roll-out
181-
(and roll-back) an upgrade, but is specifically a requirement on
182-
individual components (as opposed to the underlying platform used
183-
to manage a set of components).
186+
end-users. ISSU generally implies the ability to incrementally
187+
roll-out (and roll-back) an upgrade, but is specifically a
188+
requirement on individual components (as opposed to the underlying
189+
platform used to manage a set of components).
184190

185191
* **Monitoring & Logging:** Collecting data from system components to aid
186192
in management decisions. This includes diagnosing faults, tuning
187193
performance, doing root cause analysis, performing security audits,
188194
and provisioning additional capacity.
189195

190196
* **Analytics:** A program (often using statistical models) that
191-
produces additional insights (value) from raw data. Can be used to
192-
close a control loop (i.e., auto-reconfigure a system based on
197+
produces additional insights (value) from raw data. It can be used
198+
to close a control loop (i.e., auto-reconfigure a system based on
193199
these insights), but could also be targeted at a human operator
194-
(that subsequently takes some action).
200+
that subsequently takes some action.
195201

196202
Another way to talk about operations is in terms of stages, leading to
197203
a characterization that is common for traditional network devices:
@@ -301,9 +307,9 @@ manageable:
301307
majority of configuration involves initiating software parameters,
302308
which is more readily automated.
303309

304-
* Cloud native implies a set best-practices for addressing many of the
305-
FCAPS requirements, especially as they relate to availability and
306-
performance, both of which are achieved through horizontal
310+
* Cloud native implies a set of best-practices for addressing many of
311+
the FCAPS requirements, especially as they relate to availability
312+
and performance, both of which are achieved through horizontal
307313
scaling. Secure communication is also typically built into cloud RPC
308314
mechanisms.
309315

@@ -319,17 +325,19 @@ monitoring data in a uniform way, and (d) continually integrating and
319325
deploying individual microservices as they evolve over time.
320326

321327
Finally, because a cloud is infinitely programmable, the system being
322-
managed has the potential to change substantially over time.\ [#]_ This
323-
means that the cloud management system must itself be easily extended
324-
to support new features (as well as the refactoring of existing
325-
features). This is accomplished in part by implementing the cloud
326-
management system as a cloud service, but it also points to taking
327-
advantage of declarative specifications of how all the disaggregated
328-
pieces fit together. These specifications can then be used to generate
329-
elements of the management system, rather than having to manually
330-
recode them. This is a subtle issue we will return to in later
331-
chapters, but ultimately, we want to be able to auto-configure the
332-
subsystem responsible for auto-configuring the rest of the system.
328+
managed has the potential to change substantially over time.\ [#]_
329+
This means that the cloud management system must itself be easily
330+
extended to support new features (as well as the refactoring of
331+
existing features). This is accomplished in part by implementing the
332+
cloud management system as a cloud service, which means we will see a
333+
fair amount of recursive dependencies throughout this book. It also
334+
points to taking advantage of declarative specifications of how all
335+
the disaggregated pieces fit together. These specifications can then
336+
be used to generate elements of the management system, rather than
337+
having to manually recode them. This is a subtle issue we will return
338+
to in later chapters, but ultimately, we want to be able to
339+
auto-configure the subsystem responsible for auto-configuring the rest
340+
of the system.
333341

334342
.. [#] For example, compare the two services Amazon offered ten years
335343
ago (EC2 and S3) with the well over 100 services available on
@@ -371,13 +379,19 @@ identifies the technology we assume.
371379
~~~~~~~~~~~~~~~~~~~~~~~
372380

373381
The assumed hardware building blocks are straightforward. We start
374-
with bare-metal servers and switches, built using merchant
375-
silicon. These might, for example, be ARM or x86 processor chips and
382+
with bare-metal servers and switches, built using merchant silicon
383+
chips. These might, for example, be ARM or x86 processor chips and
376384
Tomahawk or Tofino switching chips, respectively. The bare-metal boxes
377385
also include a bootstrap mechanism (e.g., BIOS for servers and ONIE
378386
for switches), and a remote device management interface (e.g., IPMI or
379387
Redfish).
380388

389+
.. _reading_redfish:
390+
.. admonition:: Further Reading
391+
392+
Distributed Management Task Force (DMTF) `Redfish
393+
<https://www.dmtf.org/standards/redfish>`__.
394+
381395
A physical cloud cluster is then constructed with the hardware
382396
building blocks arranged as shown in :numref:`Figure %s <fig-hw>`: one
383397
or more racks of servers connected by a leaf-spine switching
@@ -397,11 +411,11 @@ that software running on the servers controls the switches.
397411
software components, which we describe next. Collectively, all the
398412
hardware and software components shown in the figure form the
399413
*platform*. Where we draw the line between what's *in the platform*
400-
and what runs *on top of the platform* will become clear in later
401-
chapters, but the summary is that different mechanisms will be
402-
responsible for (a) bringing up the platform and prepping it to host
403-
workloads, and (b) managing the various workloads that need to be
404-
deployed on that platform.
414+
and what runs *on top of the platform*, and why it is important, will
415+
become clear in later chapters, but the summary is that different
416+
mechanisms will be responsible for (a) bringing up the platform and
417+
prepping it to host workloads, and (b) managing the various workloads
418+
that need to be deployed on that platform.
405419

406420

407421
1.3.2 Server Virtualization
@@ -415,7 +429,7 @@ resources, all running on the commodity processors in the cluster:
415429
2. Kubernetes instantiates and interconnects containers.
416430

417431
3. Helm charts specify how collections of related containers are
418-
interconnected.
432+
interconnected to build applications.
419433

420434
These are all well known and ubiquitous, and so we only summarize them
421435
here. Links to related information for anyone that is not familiar

preface.rst

Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -11,21 +11,21 @@ job of it.
1111
The answer, we believe, is that the cloud is becoming ubiquitous in
1212
another way, as it moves from hundreds of datacenters to tens of
1313
thousands of enterprises. And while it is clear that the commodity
14-
cloud providers will happily manage those edge clusters as a logical
14+
cloud providers are eager to manage those edge clusters as a logical
1515
extension of their datacenters, they do not have a lock on the
1616
know-how for making that happen.
1717

1818
This book lays out a roadmap that a small team of engineers followed
19-
over a course of a year to stand-up and operationalize a hybrid cloud
20-
spanning a dozen enterprises, and hosting a non-trivial cloud native
21-
service (5G connectivity in our case, but that’s just an example). The
22-
team was able to do this by leveraging 20+ open source components,
23-
but selecting those components is just a start. There were dozens of
24-
technical decisions to make along the way, and a few thousand lines of
25-
configuration code to write. We believe this is a repeatable exercise,
26-
which we report in this book. (And the code for those configuration
27-
files is open source, for those that want to pursue the topic in more
28-
detail.)
19+
over the course of a year to stand-up and operationalize a hybrid
20+
cloud that spans a dozen enterprises, and hosts a non-trivial cloud
21+
native service (5G connectivity in our case, but that’s just an
22+
example). The team was able to do this by leveraging 20+ open source
23+
components, but selecting those components is just a start. There were
24+
dozens of technical decisions to make along the way, and a few
25+
thousand lines of configuration code to write. We believe this is a
26+
repeatable exercise, which we report in this book. (And the code for
27+
those configuration files is open source, for those that want to
28+
pursue the topic in more detail.)
2929

3030
Our roadmap may not be the right one for all circumstances, but it
3131
does shine a light on the fundamental challenges and trade-offs
@@ -41,8 +41,8 @@ How to operationalize a computing system is a question that’s as old
4141
as the field of *Operating Systems*. Operationalizing a cloud is just
4242
today’s version of that fundamental problem, which has become all the
4343
more interesting as we move up the stack, from managing *devices* to
44-
managing *services*. The fact that this topic is both timely and
45-
foundational are among the reasons it is worth studying.
44+
managing *services*. That this topic is both timely and foundational
45+
are among the reasons it is worth studying.
4646

4747

4848
Guided Tour of Open Source
@@ -80,11 +80,11 @@ Sunay for his influence on its overall design. Suchitra Vemuri's
8080
insights into testing and quality assurance were also invaluable.
8181

8282
This book is still very much a work-in-progress, and we will happily
83-
acknowledge anyone that provides feedback. Please send us your
83+
acknowledge everyone that provides feedback. Please send us your
8484
comments using the `Issues Link
8585
<https://github.com/SystemsApproach/ops/issues>`__. Also see the
8686
`Wiki <https://github.com/SystemsApproach/ops/wiki>`__ for the TODO
87-
list we're working on.
87+
list we're currently working on.
8888

8989
| Larry Peterson, Scott Baker, Andy Bavier, Zack Williams, and Bruce Davie
9090
| October 2021

0 commit comments

Comments
 (0)