Extra Credit Presentation: std::string_view
1. Introduction
The std::string_view
class is a new addition to the C++ standard library in C++17. It is a lightweight "view" to a string:
std::string_view sv = "Hello, world!";
... but it does not own the string it points to:
std::string_view getFirstWord() {
std::string temp = "Hello";
return temp; // Warning: This will dangle!
}
It is used because it avoid the overhead of allocating memory and copying the string, while still providing a safe interface to the string. It is also used to avoid the overhead of copying strings when passing them to functions.
2. string
vs. string_view
2.1 Construction from string literals
When constructing a string from a string literal, std::string_view
is a clear winner. Benchmark on OnlineGDB shows a roughly 10-20 times speedup when constructing a std::string_view
from a string literal, compared to constructing a std::string
from a string literal.
What is happing under the hood?
- Constructing from
std::string
:- Allocate memory for the string.
- Copy the string literal into the allocated memory.
std::string str = "Hello, World!";
- Constructing from
std::string_view
:- Simply points to the (read-only) memory of string literal while maintaining a size.
std::string_view sv = "Hello, World!";
2.2 Similarities with std::string
- Supports Random Access:
std::string_view sv = "Hello";
char ch2 = sv[1];
- Supports Non-Modifying String operators (e.g.,
find
,substr
,compare
, etc):
std::string_view sv = "Hello, World!";
size_t pos = sv.find("World");
- Supports Iterators: But will always be const iterators.
for (auto it = sv.begin(); it != sv.end(); ++it) {
// Do something with *it
}
2.3 Differences from std::string
- Immutable:
std::string_view sv = "Hello";
// sv[0] = 'Y'; // Compilation error
- Lacks memory ownership/allocation:
std::string str = "Hello"; // Memory allocation here
std::string_view sv = str; // No additional memory allocation
- Lacks support for string operations that modify the underlying string:
std::string_view sv = "Hello";
// sv.append(" World"); // No such function
2.4 Passing string to function
I came across someone on reddit claiming that "You always prefer std::string_view
over const string&
." Is that so??
Consider when you:
- Need to modify the string.
- Want to return a string and ensure its longevity.
- Interact with APIs mandating
std::string
.
In such cases, std::string_view
isn't the optimal choice.
3. Null Termination
It should be noted that the result of std::string_view::substr()
is not null terminated. This makes sense since it points to a substring of the original string, and it's very likely that the substring does not end with a null character. Typically this isn’t an issue because std::string_view
maintains the length of data it points to. However, it’s best not to use std::string_view
when working with APIs that require C-style strings:
void copyToBuffer(const char* source, char* dest) {
strcpy(dest, source);
}
std::string original = "Hello, world!";
std::string_view view(original);
view = view.substr(0, 5); // Now, the view is "Hello"
char buffer[10];
copyToBuffer(view.data(), buffer); // Unsafe! Data isn’t null-terminated.
There are two ways to fix this:
- Use
strncpy()
and manually add a null character to the end of the buffer. - Convert the
std::string_view
to astd::string
before passing it to the function, but in this case, you might as well just usestd::string
.
4. Simple Parser Example
std::string_view
is particularly effective in parsing. For tasks that heavily process strings, we can load a text file into memory as a single block and then tokenize it, creating a std::string_view
for each token. This provides a lightweight reference to each token and offers nearly all the advantages of std::string
without any extra heap allocations. Previously, to avoid creating many small std::string
objects, we would have needed to devise such a mechanism ourselves.
Let's examine a simple parser that divides a string into tokens using a comma as the delimiter. First, here's the approach using std::string
:
#include <string>
#include <vector>
#include <sstream>
std::vector<std::string> tokenize(const std::string& input, char delimiter) {
std::vector<std::string> tokens;
std::istringstream stream(input);
std::string token;
// Loop through the string, extracting tokens separated by the delimiter.
while (std::getline(stream, token, delimiter)) {
tokens.push_back(token);
}
return tokens;
}
In this implementation:
- We convert the input string into a stream using
std::istringstream
. - The
std::getline
function reads from this stream, splitting the string at each occurrence of the delimiter. - Each split portion (token) is added to the tokens vector.
For comparison, here's how we can achieve this with std::string_view
:
#include <string_view>
#include <vector>
std::vector<std::string_view> tokenize(std::string_view input, char delimiter) {
std::vector<std::string_view> tokens;
size_t start = 0;
size_t end = input.find(delimiter);
// Loop through the string_view, extracting tokens separated by the delimiter.
while (end != std::string_view::npos) {
tokens.push_back(input.substr(start, end - start));
start = end + 1;
end = input.find(delimiter, start);
}
tokens.push_back(input.substr(start));
return tokens;
}
In the std::string_view
version:
- We use the
find
method to locate each delimiter's position in the string. - The
substr
method then creates a new string_view pointing to the substring between two delimiter positions, without copying any characters. - This process is repeated until the entire string is tokenized.
When tested, the std::string_view
method was observed to be approximately 6 times faster than its std::string
counterpart.
5. Is it more like a pointer or reference?
5.1 Similarities with pointers
- Non-Owning:
-
Just like a pointer, a
std::string_view
does not own the data it points to. It simply provides a view or a window into some part of another string or character array. -
Rebindable:
-
A
std::string_view
can be rebound to point to different strings during its lifetime, similar to how pointers can be reassigned to point to different addresses. -
Can be null:
- It's possible for a
std::string_view
to not point to any string, analogous to a null pointer. However, a default-constructedstd::string_view
is not null but rather empty.
5.2 Similarities with references
- No need for dereferencing:
- Unlike pointers, you can directly access the elements of the underlying string that a
std::string_view
points to, without having to dereference. It feels more like working with a reference or a regular object in this regard.
std::string_view sv = "Hello";
char firstChar = sv[0]; // Directly access the first character
std::cout << firstChar << '\n'; // Outputs: H
6. Q&A
Question 1.
What is wrong with the following code?
std::string_view sv = std::string("Hello") + " World!";
Let’s break down this code:
std::string("Hello") + " World!"
: This creates a temporary std::string with the value "Hello World!".std::string_view sv
: This creates a string_view that points to the memory of the temporary std::string. After this line of code completes, the temporary std::string with value "Hello World!" gets destroyed, making sv a dangling string_view.