Extra Credit Presentation: std::string_view

1. Introduction

The std::string_view class is a new addition to the C++ standard library in C++17. It is a lightweight "view" to a string:

std::string_view sv = "Hello, world!";

... but it does not own the string it points to:

std::string_view getFirstWord() {
    std::string temp = "Hello";
    return temp; // Warning: This will dangle!
}

It is used because it avoid the overhead of allocating memory and copying the string, while still providing a safe interface to the string. It is also used to avoid the overhead of copying strings when passing them to functions.


2. string vs. string_view

2.1 Construction from string literals

When constructing a string from a string literal, std::string_view is a clear winner. Benchmark on OnlineGDB shows a roughly 10-20 times speedup when constructing a std::string_view from a string literal, compared to constructing a std::string from a string literal.

What is happing under the hood?

  • Constructing from std::string:
    • Allocate memory for the string.
    • Copy the string literal into the allocated memory.
std::string str = "Hello, World!";
  • Constructing from std::string_view:
    • Simply points to the (read-only) memory of string literal while maintaining a size.
std::string_view sv = "Hello, World!";

2.2 Similarities with std::string

  • Supports Random Access:
std::string_view sv = "Hello";
char ch2 = sv[1];
  • Supports Non-Modifying String operators (e.g., find, substr, compare, etc):
std::string_view sv = "Hello, World!";
size_t pos = sv.find("World");
  • Supports Iterators: But will always be const iterators.
for (auto it = sv.begin(); it != sv.end(); ++it) {
    // Do something with *it
}

2.3 Differences from std::string

  • Immutable:
std::string_view sv = "Hello";
// sv[0] = 'Y';  // Compilation error
  • Lacks memory ownership/allocation:
std::string str = "Hello";  // Memory allocation here
std::string_view sv = str;  // No additional memory allocation
  • Lacks support for string operations that modify the underlying string:
std::string_view sv = "Hello";
// sv.append(" World");  // No such function

2.4 Passing string to function

I came across someone on reddit claiming that "You always prefer std::string_view over const string&." Is that so??

Consider when you:

  • Need to modify the string.
  • Want to return a string and ensure its longevity.
  • Interact with APIs mandating std::string.

In such cases, std::string_view isn't the optimal choice.


3. Null Termination

It should be noted that the result of std::string_view::substr() is not null terminated. This makes sense since it points to a substring of the original string, and it's very likely that the substring does not end with a null character. Typically this isn’t an issue because std::string_view maintains the length of data it points to. However, it’s best not to use std::string_view when working with APIs that require C-style strings:

void copyToBuffer(const char* source, char* dest) {
    strcpy(dest, source);
}
std::string original = "Hello, world!";
std::string_view view(original);
view = view.substr(0, 5);  // Now, the view is "Hello"

char buffer[10];
copyToBuffer(view.data(), buffer);  // Unsafe! Data isn’t null-terminated.

There are two ways to fix this:

  • Use strncpy() and manually add a null character to the end of the buffer.
  • Convert the std::string_view to a std::string before passing it to the function, but in this case, you might as well just use std::string.

4. Simple Parser Example

std::string_view is particularly effective in parsing. For tasks that heavily process strings, we can load a text file into memory as a single block and then tokenize it, creating a std::string_view for each token. This provides a lightweight reference to each token and offers nearly all the advantages of std::string without any extra heap allocations. Previously, to avoid creating many small std::string objects, we would have needed to devise such a mechanism ourselves.

Let's examine a simple parser that divides a string into tokens using a comma as the delimiter. First, here's the approach using std::string:

#include <string>
#include <vector>
#include <sstream>

std::vector<std::string> tokenize(const std::string& input, char delimiter) {
    std::vector<std::string> tokens;
    std::istringstream stream(input);
    std::string token;

    // Loop through the string, extracting tokens separated by the delimiter.
    while (std::getline(stream, token, delimiter)) {
        tokens.push_back(token);
    }
    return tokens;
}

In this implementation:

  • We convert the input string into a stream using std::istringstream.
  • The std::getline function reads from this stream, splitting the string at each occurrence of the delimiter.
  • Each split portion (token) is added to the tokens vector.

For comparison, here's how we can achieve this with std::string_view:

#include <string_view>
#include <vector>

std::vector<std::string_view> tokenize(std::string_view input, char delimiter) {
    std::vector<std::string_view> tokens;
    size_t start = 0;
    size_t end = input.find(delimiter);

    // Loop through the string_view, extracting tokens separated by the delimiter.
    while (end != std::string_view::npos) {
        tokens.push_back(input.substr(start, end - start));
        start = end + 1;
        end = input.find(delimiter, start);
    }
    tokens.push_back(input.substr(start));
    return tokens;
}

In the std::string_view version:

  • We use the find method to locate each delimiter's position in the string.
  • The substr method then creates a new string_view pointing to the substring between two delimiter positions, without copying any characters.
  • This process is repeated until the entire string is tokenized.

When tested, the std::string_view method was observed to be approximately 6 times faster than its std::string counterpart.


5. Is it more like a pointer or reference?

5.1 Similarities with pointers

  • Non-Owning:
  • Just like a pointer, a std::string_view does not own the data it points to. It simply provides a view or a window into some part of another string or character array.

  • Rebindable:

  • A std::string_view can be rebound to point to different strings during its lifetime, similar to how pointers can be reassigned to point to different addresses.

  • Can be null:

  • It's possible for a std::string_view to not point to any string, analogous to a null pointer. However, a default-constructed std::string_view is not null but rather empty.

5.2 Similarities with references

  • No need for dereferencing:
  • Unlike pointers, you can directly access the elements of the underlying string that a std::string_view points to, without having to dereference. It feels more like working with a reference or a regular object in this regard.
std::string_view sv = "Hello";
char firstChar = sv[0];  // Directly access the first character
std::cout << firstChar << '\n';  // Outputs: H

6. Q&A

Question 1.

What is wrong with the following code?

std::string_view sv = std::string("Hello") + " World!";

Let’s break down this code:

  • std::string("Hello") + " World!": This creates a temporary std::string with the value "Hello World!".
  • std::string_view sv: This creates a string_view that points to the memory of the temporary std::string. After this line of code completes, the temporary std::string with value "Hello World!" gets destroyed, making sv a dangling string_view.