Extra Credit Presentation: std::string_view
1. Introduction
The std::string_view class is a new addition to the C++ standard library in C++17. It is a lightweight "view" to a string:
std::string_view sv = "Hello, world!";
... but it does not own the string it points to:
std::string_view getFirstWord() {
std::string temp = "Hello";
return temp; // Warning: This will dangle!
}
It is used because it avoid the overhead of allocating memory and copying the string, while still providing a safe interface to the string. It is also used to avoid the overhead of copying strings when passing them to functions.
2. string vs. string_view
2.1 Construction from string literals
When constructing a string from a string literal, std::string_view is a clear winner. Benchmark on OnlineGDB shows a roughly 10-20 times speedup when constructing a std::string_view from a string literal, compared to constructing a std::string from a string literal.
What is happing under the hood?
- Constructing from
std::string:- Allocate memory for the string.
- Copy the string literal into the allocated memory.
std::string str = "Hello, World!";
- Constructing from
std::string_view:- Simply points to the (read-only) memory of string literal while maintaining a size.
std::string_view sv = "Hello, World!";
2.2 Similarities with std::string
- Supports Random Access:
std::string_view sv = "Hello";
char ch2 = sv[1];
- Supports Non-Modifying String operators (e.g.,
find,substr,compare, etc):
std::string_view sv = "Hello, World!";
size_t pos = sv.find("World");
- Supports Iterators: But will always be const iterators.
for (auto it = sv.begin(); it != sv.end(); ++it) {
// Do something with *it
}
2.3 Differences from std::string
- Immutable:
std::string_view sv = "Hello";
// sv[0] = 'Y'; // Compilation error
- Lacks memory ownership/allocation:
std::string str = "Hello"; // Memory allocation here
std::string_view sv = str; // No additional memory allocation
- Lacks support for string operations that modify the underlying string:
std::string_view sv = "Hello";
// sv.append(" World"); // No such function
2.4 Passing string to function
I came across someone on reddit claiming that "You always prefer std::string_view over const string&." Is that so??
Consider when you:
- Need to modify the string.
- Want to return a string and ensure its longevity.
- Interact with APIs mandating
std::string.
In such cases, std::string_view isn't the optimal choice.
3. Null Termination
It should be noted that the result of std::string_view::substr() is not null terminated. This makes sense since it points to a substring of the original string, and it's very likely that the substring does not end with a null character. Typically this isn’t an issue because std::string_view maintains the length of data it points to. However, it’s best not to use std::string_view when working with APIs that require C-style strings:
void copyToBuffer(const char* source, char* dest) {
strcpy(dest, source);
}
std::string original = "Hello, world!";
std::string_view view(original);
view = view.substr(0, 5); // Now, the view is "Hello"
char buffer[10];
copyToBuffer(view.data(), buffer); // Unsafe! Data isn’t null-terminated.
There are two ways to fix this:
- Use
strncpy()and manually add a null character to the end of the buffer. - Convert the
std::string_viewto astd::stringbefore passing it to the function, but in this case, you might as well just usestd::string.
4. Simple Parser Example
std::string_view is particularly effective in parsing. For tasks that heavily process strings, we can load a text file into memory as a single block and then tokenize it, creating a std::string_view for each token. This provides a lightweight reference to each token and offers nearly all the advantages of std::string without any extra heap allocations. Previously, to avoid creating many small std::string objects, we would have needed to devise such a mechanism ourselves.
Let's examine a simple parser that divides a string into tokens using a comma as the delimiter. First, here's the approach using std::string:
#include <string>
#include <vector>
#include <sstream>
std::vector<std::string> tokenize(const std::string& input, char delimiter) {
std::vector<std::string> tokens;
std::istringstream stream(input);
std::string token;
// Loop through the string, extracting tokens separated by the delimiter.
while (std::getline(stream, token, delimiter)) {
tokens.push_back(token);
}
return tokens;
}
In this implementation:
- We convert the input string into a stream using
std::istringstream. - The
std::getlinefunction reads from this stream, splitting the string at each occurrence of the delimiter. - Each split portion (token) is added to the tokens vector.
For comparison, here's how we can achieve this with std::string_view:
#include <string_view>
#include <vector>
std::vector<std::string_view> tokenize(std::string_view input, char delimiter) {
std::vector<std::string_view> tokens;
size_t start = 0;
size_t end = input.find(delimiter);
// Loop through the string_view, extracting tokens separated by the delimiter.
while (end != std::string_view::npos) {
tokens.push_back(input.substr(start, end - start));
start = end + 1;
end = input.find(delimiter, start);
}
tokens.push_back(input.substr(start));
return tokens;
}
In the std::string_view version:
- We use the
findmethod to locate each delimiter's position in the string. - The
substrmethod then creates a new string_view pointing to the substring between two delimiter positions, without copying any characters. - This process is repeated until the entire string is tokenized.
When tested, the std::string_view method was observed to be approximately 6 times faster than its std::string counterpart.
5. Is it more like a pointer or reference?
5.1 Similarities with pointers
- Non-Owning:
-
Just like a pointer, a
std::string_viewdoes not own the data it points to. It simply provides a view or a window into some part of another string or character array. -
Rebindable:
-
A
std::string_viewcan be rebound to point to different strings during its lifetime, similar to how pointers can be reassigned to point to different addresses. -
Can be null:
- It's possible for a
std::string_viewto not point to any string, analogous to a null pointer. However, a default-constructedstd::string_viewis not null but rather empty.
5.2 Similarities with references
- No need for dereferencing:
- Unlike pointers, you can directly access the elements of the underlying string that a
std::string_viewpoints to, without having to dereference. It feels more like working with a reference or a regular object in this regard.
std::string_view sv = "Hello";
char firstChar = sv[0]; // Directly access the first character
std::cout << firstChar << '\n'; // Outputs: H
6. Q&A
Question 1.
What is wrong with the following code?
std::string_view sv = std::string("Hello") + " World!";
Let’s break down this code:
std::string("Hello") + " World!": This creates a temporary std::string with the value "Hello World!".std::string_view sv: This creates a string_view that points to the memory of the temporary std::string. After this line of code completes, the temporary std::string with value "Hello World!" gets destroyed, making sv a dangling string_view.