layout	title
presentation	Miscellaneous slides

Error handling

The previous example is a simple way to handle errors.

we'll learn a better way when we cover exceptions later in the course

Some general rules about error handling:

If your code is making assumptions about its inputs, check them!
- for example, if your code assumes a parameter is positive, check and alert the user if this is not the case (and exit if appropriate)
- this will save you time later on!

If you can reasonably handle the error, do so, but warn the user about it

Most of the time, the error will be fatal (not recoverable), in which case your program should not proceed
- issue an informative message about the problem, and exit with a non-zero exit code

try to detect errors as early as possible
- for example: don't start a long batch of processing without first checking that you can create the output file

Type Aliasing

We have already started using a vector of strings to store the fragments – and it does seem to provide all the functionality we need at this point

We could type std::vector<std::string> as the type for the fragment list wherever this is needed

but it is a bit long...
it is not obvious what its purpose is to outside observers
what if we decide to use a different type to store the fragments at a later stage? We'd need to modify our code in lots of different places

This is an example where defining a type alias might be a good idea:

using Fragments = std::vector<std::string>;

Now we can use Fragments as the type for our list of fragments, and the compiler will understand that to mean std::vector<std::string>

note this is not a new type - merely an alias for an existing type

Type Aliasing

What does this look like in our project?

... 

*using Fragments = std::vector<std::string>;

`Fragments` load_fragments (const std::string& filename)
{
  ...

... 

void fragment_statistics (const `Fragments`& fragments)
{
  ...

Structured binding

Splitting up grouped data

Structured binding

The C++17 standard introduced the ability to use structured binding

this allows us to declare and initialise multiple variables from the individual members of a larger composite variable

This is best illustrated by example, using our find_biggest_overlap() function:

`auto overlap` = find_biggest_overlap (sequence, fragments);
std::cerr << std::format ("overlap of size {} at index {}\n", 
                          `overlap.size`, `overlap.fragment`);

Using structured binding, we can declare independent variables for each member of the Overlap structure returned:

auto [ overlap, index ] = find_biggest_overlap (sequence, fragments);
std::cerr << std::format ("overlap of size {} at index {}\n", `overlap`, `index`);

Structured binding using `std::tuple`

The C++ STL provides a convenience template class that can be used as a lightweight replacement for a dedicated struct in a variety of contexts: std::tuple

One use for std::tuple is for structured binding:

#include <tuple>

std::tuple<int,int> find_biggest_overlap (const std::string& sequence, 
                                          Fragments& fragments)
{
  ...
  return { biggest_overlap, fragment_with_biggest_overlap };
}

...

auto [ overlap, index ] = find_biggest_overlap (sequence, fragments);

Why don't we use `std::tuple`?

While appealing in its simplicity, std::tuple does not provide any indication about the interpretation or meaning of each member

in our case, all we know is that we get 2 ints
which one represents the overlap size, and which the index?
We have to rely on additional documentation to work this out
- this could be described in a comment, or some other external document

On the other hand, using a struct provides named members, whose interpretation is immediately much clearer.

⇒ Better to use a struct with named members rather than a std::tuple where possible

The std::tuple class is very useful in other contexts, especially when working with generic functions and containers

a generic function or container will not know ahead of time what it is to contain
std::tuple provides a simple container for an arbitrary number of variables of arbitrary types
std::tuple is therefore used in many places throughout the C++ STL

Structured binding

Structured binding offers another way to return multiple values from a function

this can sometimes be more convenient or more legible

There are many other uses for structured binding

many of these are too advanced for this course
- indeed, arguably some of the most useful were only introduced in C++23!
if interested, have a look online for examples

`std::string_view`

avoiding needless copies

Performance issues

Let's take a look at our code so far, and try to reason about where most of the time is spent

The bulk of our algorithm consists of a series of nested loops:

• loop while there are fragments left to merge
• loop over all candidate fragments
• loop over all overlap sizes
• extract the corresponding substrings and compare
• merge the fragment with the largest overlap
• remove it from the list of candidate fragments

When looking for opportunities to improve performance, we should focus our attention on the innermost loop

this is the portion of code that will be executed most often

⇒ what is going on in the innermost loop?

The innermost loop

int compute_overlap (const std::string& sequence, const std::string& fragment)
{
  ...
  for (int overlap = fragment.size(); overlap > 0; --overlap) {
    const auto seq_start = sequence.substr(0, overlap);
    const auto frag_end = fragment.substr(fragment.size()-overlap);
    if (seq_start == frag_end) {
      largest_overlap = overlap;
      break;
    }
  }
  ...

-- We actually have three loops, although we did not write them up ourselves!

The innermost loop

int compute_overlap (const std::string& sequence, const std::string& fragment)
{
  ...
  for (int overlap = fragment.size(); overlap > 0; --overlap) {
    const auto seq_start = `sequence.substr(0, overlap)`;
    const auto frag_end = `fragment.substr(fragment.size()-overlap)`;
    if (seq_start == frag_end) {
      largest_overlap = overlap;
      break;
    }
  }
  ...

We actually have three loops, although we did not write them up ourselves!

one for extracting each substring

The innermost loop

int compute_overlap (const std::string& sequence, const std::string& fragment)
{
  ...
  for (int overlap = fragment.size(); overlap > 0; --overlap) {
    const auto seq_start = sequence.substr(0, overlap);
    const auto frag_end = fragment.substr(fragment.size()-overlap);
    if (`seq_start == frag_end`) {
      largest_overlap = overlap;
      break;
    }
  }
  ...

We actually have three loops, although we did not write them up ourselves!

one for extracting each substring
one for the comparison itself

The innermost loop

int compute_overlap (const std::string& sequence, const std::string& fragment)
{
  ...
  for (int overlap = fragment.size(); overlap > 0; --overlap) {
    `const auto seq_start` = sequence.substr(0, overlap);
    `const auto frag_end` = fragment.substr(fragment.size()-overlap);
    if (seq_start == frag_end) {
      largest_overlap = overlap;
      break;
    }
  }
  ...

We actually have three loops, although we did not write them up ourselves!

one for extracting each substring
one for the comparison itself

We also have some implicit memory allocation & deallocation for each substring

we will cover this much later in the course

The innermost loop

int compute_overlap (const std::string& sequence, const std::string& fragment)
{
  ...
  for (int overlap = fragment.size(); overlap > 0; --overlap) {
    const auto seq_start = sequence.substr(0, overlap);
    const auto frag_end = fragment.substr(fragment.size()-overlap);
    if (seq_start == frag_end) {
      largest_overlap = overlap;
      break;
    }
  }
  ...

Can we avoid any of these overheads?

there is not much that can be done about the comparison itself
- thankfully, it will tend to terminate early (as soon as any character doesn't match) --
but can we avoid copying each substring into a whole new std::string to perform the comparison?

Avoiding needless copying

int compute_overlap (const std::string& sequence, const std::string& fragment)
{
  ...
  for (int overlap = fragment.size(); overlap > 0; --overlap) {
    bool match = true;
    for (int n = 0; n < overlap; ++n) {
      if (sequence[n] != fragment[fragment.size()-overlap+n]) {
        match = false;
        break;
      }
    }
    if (match) {
      largest_overlap = overlap;
      break;
    }
  }
  ...

We can perform the comparison manually character-by-character, ensuring we compare the characters at the correct positions

The `std::string_view` class



int compute_overlap (const std::string& sequence, const std::string& fragment)
{
  ...
  for (int overlap = fragment.size(); overlap > 0; --overlap) {
    const auto seq_start = sequence.substr(0, overlap);
    const auto frag_end = fragment.substr(fragment.size()-overlap);
    if (seq_start == frag_end) {
      largest_overlap = overlap;
      break;
    }
  }
  ...

... or we can use the std::string_view class to achieve the same effect!

The `std::string_view` class

#include <string_view>
...
int compute_overlap (const std::string& sequence, const std::string& fragment)
{
  ...
  for (int overlap = fragment.size(); overlap > 0; --overlap) {
    const auto seq_start = `std::string_view (sequence)`.substr(0, overlap);
    const auto frag_end = `std::string_view (fragment)`.substr(fragment.size()-overlap);
    if (seq_start == frag_end) {
      largest_overlap = overlap;
      break;
    }
  }
  ...

... or we can use the std::string_view class to achieve the same effect!

The `std::string_view` class

How does this work?

Each std::string holds information consisting of:

the size of the string
a handle to the memory location corresponding to the start of the text
- this is actually a pointer – which we are trying to avoid as much as possible on this course!

                         ↓
(*#&\:;*'¬|\^$(&*^$%("()!`this is the text that our string contains`@~@:~#*(£';~"£*(£|#@

-- Memory is essentially a very long list of 8-bit bytes

most of these are going to contain data in machine representation (binary)
- none of that will make sense as text
the text for our string will occupy some contiguous section of memory
- the std::string class only needs to know where it starts and how long it is

The std::string_view class holds exactly the same information

The `std::string_view` class

The std::string_view class provides a non-owning (and non-modifiable) view of a string

Notably, it provides its own substr() method, which also returns a std::string_view

with a different start and size for the substring
no copying is performed!

Original std::string:

start at position 1000, size 41 bytes

                         ↓
(*#&\:;*'¬|\^$(&*^$%("()!`this is the text that our string contains`@~@:~#*(£';~"£*(£|#@
⁣

The `std::string_view` class

The std::string_view class provides a non-owning (and non-modifiable) view of a string

Notably, it provides its own substr() method, which also returns a std::string_view

with a different start and size for the substring
no copying is performed!

Original std::string:

start at position 1000, size 41 bytes

                         ↓
(*#&\:;*'¬|\^$(&*^$%("()!this is the text that `our string` contains@~@:~#*(£';~"£*(£|#@
                                               ↑

std::string_view for "our string":

start at position 1022, size 10 bytes

Ownership and lifetime

The main difference between std::string and std::string_view is one of ownership

Each std::string 'owns' the memory where its text is stored, and is responsible for its management

it needs to allocate the memory when some text needs to be stored
it needs to free or deallocate that memory when no longer in use, so that the memory can be used for other purposes

Each std::string_view only provides a view into some existing text in memory

it does not own that memory, and is not responsible for managing it

Consequently, std::string_view instances are very cheap to create and destroy

no memory management!
no copying!

But we need to be very careful with the lifetime of that memory location...

Ownership and lifetime

When extracting a substring using the original std::string, we got a new std::string

this new std::string contained a fully self-contained copy of the corresponding text
its lifetime was not tied to the lifetime of the original std::string

But when we extract a substring using the new std::string_view, we get a non-owning view

the text in that view is completely dependent on the original std::string
its lifetime is intrisincally tied to that of the original std::string

There is nothing to prevent the use of a std::string_view that is no longer valid!

no mechanism exists in C++ to prevent this
if the original std::string no longer exists, or points to new location, we will get undefined behaviour
- the code may crash, we could get nonsense back, etc.

-- It is up to the developer (you!) to ensure the lifetime of the text exceeds the lifetime of the view

this is a very common source of bugs and security vulnerabilities

Ownership and lifetime – examples

std::string text ("this is the text that our string contains");
std::cout << text << "\n";

auto view = std::string_view (text).substr (22, 10);
std::cout << view << "\n";

text.clear();     // ⇐ original string is no longer valid!
std::cout << view << "\n";

If you try this, you may find that the code works just fine

this is because the memory where the text was stored hasn't been modified
that memory location has been marked as unused and is available for other uses
... but we haven't needed to store anything else there – yet!

This is why the result of the last line of code is undefined behaviour

it may work as expected sometimes, or it may crash other times
it may depend on what compiler is used, or what settings were used for the compiler, etc.

Ownership and lifetime – examples

std::string text ("this is the text that our string contains");
std::cout << text << "\n";

auto view = std::string_view (text).substr (22, 10);
std::cout << view << "\n";

text = "a longer piece of text that won't fit in the original memory location");
// ⇑ forces re-allocation: original string is no longer valid!
std::cout << view << "\n";

Assigning a different string to our variable can also force a re-allocation

particularly if the new string is longer than the original
there is no guarantee that the new string will be allocated at the same location!

Ownership and lifetime – examples

std::string_view get_substr () 
{
  std::string original = read_from_file();
  return std::string_view (original);
}

// ⇓ original string has gone out of scope and no longer exists!
std::cout << get_substr() << "\n";

Returning a non-owning view or reference to a function scope, local variable is a common mistake!

this applies to any non-owning reference or similar construct

Ownership and lifetime – examples

// a vector of 16 integers:
std::vector<int> v = (16, 0); 

// we take an iterator to the first element:
auto iter = v.begin();

// we take a reference to the second element:
int& ref (v[1]);

// we append another element to the vector:
v.push_back (1);
// ⇑ this may cause a re-allocation of the contents of the vector
// to a different memory location!

This issue is much broader than just std::string_view

it affects any constructs that refers to data owned or managed by some other construct
it can affect references and STL iterators too!

Ownership and lifetime – examples

These are examples of a more general class of problems, generally referred to as dangling references

you will also come across 'dangling pointers', which is essentially the same problem

This applies to references, iterators, pointers and any non-owning construct

It is not always obvious when such references become invalidated

the original owner has been destroyed or gone out of scope
the original owner has been cleared of its contents
the original owner has moved its contents to a different location

For these reasons, use non-owning constructs with caution!

in general, they are safe as function arguments
- the owning object lives in a broader scope with a longer lifetime, and will not change while the function runs
they are not safe as return values
- especially when referring to function scope variables!
they are not safe when there is a possibility of modification of the original container

Back to our own code...

`#include <string_view>`
...
int compute_overlap (const std::string& sequence, const std::string& fragment)
{
  ...
  for (int overlap = fragment.size(); overlap > 0; --overlap) {
    const auto seq_start = `std::string_view (sequence)`.substr(0, overlap);
    const auto frag_end = `std::string_view (fragment)`.substr(fragment.size()-overlap);
    if (seq_start == frag_end) {
      largest_overlap = overlap;
      break;
    }
  }
  ...

In our case, we can see that:

the lifetime of both std::string_views is shorter than the original sequence & fragments
there is no scope for our code to modify either sequence or fragment while our views are live
- indeed, we enforced this by providing these to compute_overlap() as const references!

Impact of using `std::string_view` on performance

We can measure the impact of this simple change using the time command:

Before the change:

$ time ./shotgun ../data/fragments-1.txt out
initial sequence has size 1000
final sequence has length 20108

`real    0m0.359s`
user    0m0.359s
sys     0m0.000s

Impact of using `std::string_view` on performance

We can measure the impact of this simple change using the time command:

After the change:

$ time ./shotgun ../data/fragments-1.txt out
initial sequence has size 1000
final sequence has length 20108

`real    0m0.041s`
user    0m0.037s
sys     0m0.004s

This is almost 10× faster!

Abstraction and encapsulation

The difference between abstraction and encapsulation is far from obvious!

According to Bjarne Stroustrup, the original developer of C++:

abstraction is the act of specifying a general interface hiding implementation details.
encapsulation is the enforcement of abstraction by mechanisms that prevent access to implementation details of an object or a group of objects except through a well-defined interface.

Abstraction relates to the process by which we decide which methods to make available to users, and which aspects of the implementation (data members, etc) should be hidden from users

this is more about the design process

encapsulation relates the means by which abstraction is enforced.

C++ classes provides a mechanism to achieve encapsulation
this is more about the implementation process

In practice, the two concepts go hand in hand: abstraction relies on encapsulation, and vice-versa

The copy constructor and assignment operator

If we have a class that requires a non-default destructor, we will typically also need to redefine the copy constructor and assignment operator

this is also referred to as the rule of five
- technically, we would also need to define the move constructor and move assignment operator, but this involves move semantics, which is beyond the scope of this course.

Let's consider the std::string as an example of how this works:

std::string needs to manage a region of memory to hold its data --
its constructor needs to acquire the required amount of memory from the system --
its destructor needs to release that memory back to the system
- otherwise programs that use std::string will suffer from memory leaks
- for long-running programs, this will eventually use up all of the available memory and crash --
the copy constructor needs to acquire some memory from the system and copy the contents of the other string into that memory --
the assignment operator needs to release the memory it was previously managing, acquire enough memory to hold the new contents, and copy the contents of the other string into its own memory
- it may skip the memory release / re-acquire if it already had enough memory to hold the new contents

The copy constructor and assignment operator

The copy constructor is invoked when creating an instance initialised as a copy of another:

std::string a = "hello!";   // <= use normal constructor

// the following all invoke the copy constructor:
std::string b (a); 
std::string c = a;    // <= equivalent to "std::string c (a);"
auto d = a;

All of the above will invoke std::string's copy constructor
Note: when using the assignment operator (=) at declaration time, the C++ standard specifies that this actually invokes the copy constructor

The assignment operator is invoked when an instance is assigned to another

as long as the assignment is not done as part of the declaration, since that uses the copy constructor

std::string a = "hello!";
std::string b = "goodbye";

b = a;    // <= invokes assignment operator

The copy constructor and assignment operator

To illustrate, let's consider a class to manage a session with a remote server:

class Session {
  public:
    Session (const std::string& url, const std::string& username);
    ~Session ();

  private:
    int m_socket;
};

FilesExpand file tree

misc.md

Latest commit

History

misc.md

File metadata and controls

Error handling

Type Aliasing

Type Aliasing

Structured binding

Splitting up grouped data

Structured binding

Structured binding using std::tuple

Why don't we use std::tuple?

Structured binding

std::string_view

avoiding needless copies

Performance issues

The innermost loop

The innermost loop

The innermost loop

The innermost loop

The innermost loop

Can we avoid any of these overheads?

Avoiding needless copying

The std::string_view class

The std::string_view class

The std::string_view class

The std::string_view class

The std::string_view class

Ownership and lifetime

Ownership and lifetime

Ownership and lifetime

Ownership and lifetime – examples

Ownership and lifetime – examples

Ownership and lifetime – examples

Ownership and lifetime – examples

Ownership and lifetime – examples

Back to our own code...

Impact of using std::string_view on performance

Impact of using std::string_view on performance

Abstraction and encapsulation

The copy constructor and assignment operator

The copy constructor and assignment operator

The copy constructor and assignment operator

Structured binding using `std::tuple`

Why don't we use `std::tuple`?

`std::string_view`

The `std::string_view` class

The `std::string_view` class

The `std::string_view` class

The `std::string_view` class

The `std::string_view` class

Impact of using `std::string_view` on performance

Impact of using `std::string_view` on performance