07 String
1. Introduction to Strings
1.1 What is a String?
In C, a string is essentially an array of characters terminated by a null character ('\0'
). Unlike languages like Python or Java, C doesn't have a built-in string type. So, we often use arrays or pointers to char
to represent strings.
char str1[] = "Hello, world!";
char *str2 = "Hello, world!";
1.2 Importance in Embedded Systems
Strings are crucial in embedded systems for various tasks like parsing messages from sensors, formatting output data, storing configuration parameters, or even just for debugging purposes. However, given the resource constraints, managing strings efficiently is often more critical here than in general-purpose computing.
2. String Basics
2.1 Null-Terminated Strings
The null-terminator ('\0'
) indicates the end of the string. This is a convention in C and is essential for many standard string functions like strlen
, strcpy
, etc., to work correctly.
char message[] = {'H', 'e', 'l', 'l', 'o', '\0'};
2.2 String Initialization
Strings can be initialized in different ways:
- Direct Initialization:
char greeting[] = "Hello";
This is a string literal, which is a sequence of characters enclosed in double-quotes. The compiler will automatically add a null-terminator at the end. This is the preferred way of initializing strings. This will be stored in the read-only memory (ROM) of the microcontroller.
- Element-by-Element Initialization:
char greeting[5] = {'H', 'e', 'l', 'l', 'o'};
This will be stored in the read-write memory (RAM) of the microcontroller.
If you're not placing a null-terminator, you should handle the string carefully to avoid buffer overflows.
2.3 String Length
The length of a string doesn't count the null-terminator. You can get the length using the strlen
function from <string.h>
.
#include <string.h>
size_t len = strlen("Hello"); // len will be 5
For character arrays, you can use the sizeof
operator to get the length.
char greeting[] = "Hello";
size_t len = sizeof(greeting); // len will be 6
2.4 Accessing String Elements
Just like arrays, you can access individual characters of a string using an index.
char first_letter = greeting[0]; // 'H'
Keep in mind that accessing an index beyond the string length (not counting the null-terminator) will result in undefined behavior.
3. String Manipulation
3.1 String Concatenation
Combining two strings is called concatenation. You can use strcat
or strncat
from <string.h>
to do this. Remember, the destination string should have enough space to hold the concatenated result.
char dest[20] = "Hello";
char src[] = " World";
strcat(dest, src); // dest becomes "Hello World"
3.2 String Comparison
To compare strings, you can use strcmp
or strncmp
. These functions return zero if the strings are equal, a positive number if the first is greater, and a negative number otherwise.
int cmp = strcmp("apple", "banana"); // cmp will be negative
The cmp
is the difference between the ASCII values of the first non-matching characters. So, if the first string is greater, cmp
will be positive, and vice-versa.
3.3 String Copying
Copying one string into another can be done using strcpy
or strncpy
.
char src[] = "source";
char dest[20];
strcpy(dest, src); // dest now contains "source"
However, this is not safe because it can cause buffer overflows. So, it's better to use strncpy
and specify the maximum number of characters to copy.
char src[] = "source";
char dest[20];
strncpy(dest, src, sizeof(dest)); // dest now contains "source"
3.4 String to Integer, Float Conversions
Conversion functions like atoi
, atol
, atof
convert strings to integers, long integers, and floats, respectively. These functions are in <stdlib.h>
.
int num = atoi("42"); // num will be 42
long num = atol("42"); // num will be 42
float num = atof("42.0"); // num will be 42.0
On the other hand, itoa
, ltoa
, ftoa
convert integers, long integers, and floats to strings, respectively. These functions are not standard, so you might have to write your own.
char str[20];
itoa(42, str, 10); // str will be "42"
ftoa(42.0, str, 10); // str will be "42.0"
The last argument is the base. For example, itoa(42, str, 16)
will convert 42 to hexadecimal string "2a".
3.5 Get String Length with strlen
You can find the length of a string using strlen
.
size_t len = strlen("Hello"); // len will be 5
3.6 Find First Occurrence of a Character with strchr
This function finds the first occurrence of a character in a string.
char *pos = strchr("Hello", 'e'); // pos will point to the 'e' in "Hello"
3.7 Find Substring with strstr
Finds the first occurrence of a substring within another string.
char *pos = strstr("Hello, world!", "world"); // pos will point to "world!"
3.8 Tokenize String with strtok
This function can be used to split a string into multiple "tokens" separated by specified delimiters.
char str[] = "Hello, world!";
char *token = strtok(str, ", "); // token will point to "Hello"'s 'H'
while (token != NULL) {
printf("%s\n", token);
token = strtok(NULL, ", "); // NULL to continue from previous position
}
This will print:
Hello
world!
The syntax of strtok
is strtok(char *str, const char *delim)
.
- The first argument (
str
) is the string you want to tokenize. For the first call, you pass the string you want to break up. - The second argument (
delim
) is a string containing all possible delimiter characters.
After the first call, strtok
remembers where it left off in the string str
. If you pass NULL
as the str
argument in subsequent calls, strtok
continues tokenizing the same string from where it left off.
3.8 Custom String Functions
In embedded systems, you might often write custom string functions to reduce code size or increase efficiency.
// Custom function to get string length
size_t custom_strlen(const char *str) {
size_t len = 0;
while (*str++) ++len;
return len;
}
4. Advanced Topics
4.1 Dynamic Strings
In embedded systems, dynamic memory allocation might be risky but sometimes necessary. Dynamic strings are strings whose size can change during runtime. This involves using functions like malloc
and realloc
.
char *str = (char *)malloc(10); // Allocates 10 bytes
if (str != NULL) {
strcpy(str, "Hello");
}
4.2 Strings and Pointers
Understanding that strings in C are actually arrays of characters often manipulated via pointers can save both time and memory.
char str1[] = "Hello";
char *str2 = "Hello";
Here, str1
is an array that contains a copy of the string "Hello", while str2
is a pointer that points to a string literal.
4.3 String Literals vs. String Arrays
String literals are read-only and stored in a read-only section of memory, while string arrays can be modified.
char *str1 = "Hello"; // String literal, read-only
char str2[] = "Hello"; // String array, modifiable
4.4 Memory Layout of Strings
Understanding how strings are stored can help you optimize your embedded systems code.
- Null-terminated strings: Stored as arrays with a null character at the end.
- String literals: Usually stored in a read-only section of memory.
- Dynamic strings: Stored in the heap if you're using dynamic memory allocation.
5. String Safety
5.1 Buffer Overflows
Buffer overflows can occur when you write more data into a string than it can hold. This is particularly risky in C, as it can lead to undefined behavior or even system crashes.
char buffer[10];
strcpy(buffer, "This is a string that's too long for the buffer");
Always use safe versions like strncpy
and remember to null-terminate the string.
char buffer[10];
strncpy(buffer, "Too long", sizeof(buffer) - 1);
buffer[sizeof(buffer) - 1] = '\0';
5.2 Immutable Strings
Making a string immutable means that once it's created, it cannot be changed. This is good for string literals and can help prevent inadvertent modifications.
const char *immutableString = "Cannot Change Me";
5.3 String Vulnerabilities and Mitigations
String vulnerabilities, like format string vulnerabilities, can expose your system to security risks.
printf(userInput); // NEVER do this
Instead, always use a format string:
printf("%s", userInput);
6. Optimization Techniques
6.1 Efficient String Operations
Efficiency is key when you're dealing with strings. For instance, avoid using strcat
in a loop as it's an O(n) operation each time.
Instead, consider this:
char dest[100];
char *ptr = dest;
strcpy(ptr, "Hello");
ptr += strlen("Hello");
strcpy(ptr, " World");
In this way, you can manipulate the pointer directly and make the operation faster.
7. Best Practices
7.1 When to Use Strings
Strings are often needed for user interfaces, configuration, or protocol handling. However, in embedded systems where memory and processing power are limited, avoid using them for temporary or internal operations where simple integer or byte manipulations would suffice.
7.2 Avoiding Common Pitfalls
Always validate the length of the string before performing operations. Be cautious when using functions like strcpy
or strcat
that do not check bounds. Use their safer counterparts like strncpy
and strncat
when possible.
7.3 Coding Guidelines for Strings
- Avoid global string variables to minimize memory usage.
- Always initialize your strings.
- Avoid "magic numbers"; use defined constants for lengths.
- Use functions that limit length to avoid overflows, such as
strncpy
instead ofstrcpy
.
8. Common Pitfalls
8.1 Off-by-One Errors
These are common and can result in undefined behavior or crashes. Always remember that strings in C are null-terminated.
char myString[5];
strcpy(myString, "hello"); // Off-by-one error. Should have use a length of 6
8.2 Memory Leaks
If you use dynamic memory allocation for strings, always ensure you free the memory after you're done.
char *str = malloc(10);
// Do stuff
free(str); // Don't forget this
9. Q&A
1. Question: What is the difference between a string literal and a string array in C?
Answer: A string literal is a sequence of characters enclosed in double quotes and is stored in a read-only section of memory. For example: char *str = "Hello";
. On the other hand, a string array reserves memory in the stack (or in the data section if it's a global or static variable) and can be modified. For example: char str[6] = "Hello";
.
2. Question: How is a string represented in C?
Answer: In C, strings are arrays of characters terminated by a null character ('\0'
). This null character indicates the end of the string.
3. Question: Why is it risky to use gets()
for string input in C?
Answer: The gets()
function is dangerous because it does not check the size of the input or the buffer it is writing to. This can result in buffer overflow, potentially leading to unpredictable behavior, data corruption, or security vulnerabilities. It's recommended to use safer alternatives like fgets()
.
4. Question: In embedded systems, why might you prefer to use fixed-size char arrays over dynamic memory allocation for strings?
Answer: In embedded systems, memory is often limited and dynamic memory allocation can introduce fragmentation issues. Fixed-size char arrays provide predictable memory usage, which is crucial for real-time and resource-constrained environments.
5. Question: Examine the code:
char *str1 = "Hello";
char str2[] = "Hello";
str1[0] = 'J';
What issue might arise from this?
Answer: Modifying a string literal is undefined behavior in C. Since str1
points to a string literal, attempting to change its content can result in runtime errors. On the other hand, str2
is an array and can be modified safely.
6. Question: What function can you use in C to compare two strings, and how does it work?
Answer: You can use the strcmp()
function. It compares two strings lexicographically. If the strings are identical, it returns 0. If the first string is lexicographically smaller than the second, it returns a negative value, and if it's larger, it returns a positive value.
7. Question: Why is it important to ensure null termination when working with strings in C?
Answer: Without null termination, functions that operate on strings (like strlen()
, strcpy()
, etc.) won't know where the string ends. This can lead to reading beyond the allocated memory, buffer overflows, and other unpredictable behaviors.
8. Question: In the context of embedded systems, what is a potential problem with using strcat()
without checking sizes?
Answer: strcat()
appends one string to another without checking buffer sizes. If the destination buffer isn't large enough to accommodate the concatenated result, it will lead to a buffer overflow. This is risky in any environment but especially so in embedded systems where memory is constrained and recovery from errors might not be straightforward.
9. Question: What do you understand by "string safety" in embedded C?
Answer: "String safety" refers to practices that prevent common string-related vulnerabilities like buffer overflows, off-by-one errors, and uncontrolled format string issues. In embedded C, it emphasizes using bounded string operations, ensuring null termination, avoiding risky functions like gets()
, and being wary of user input or data from external sources.
10. Question: Why might string manipulation be more challenging in embedded systems as compared to general-purpose systems?
Answer: Embedded systems often have limited resources, such as RAM and processing power. As a result, operations like dynamic string allocation or complex string manipulations might be inefficient or not feasible. Additionally, embedded systems may not have the full standard C library available, limiting the functions at one's disposal.
11. Question: Given that embedded systems often have memory constraints, what's the potential pitfall of using strdup()
?
Answer: The strdup()
function dynamically allocates memory for a copy of the string. In embedded systems with limited memory, repeated or unchecked use of strdup()
can lead to memory exhaustion or fragmentation. It's crucial to free any memory allocated by strdup()
once done and to check for allocation failures.
12. Question: Why is using sprintf()
potentially risky, and how can you mitigate its risks?
Answer: The sprintf()
function does not check the size of the destination buffer, which can lead to buffer overflows. To mitigate this, you can use snprintf()
, which takes the size of the destination buffer as an argument, helping prevent buffer overruns.
13. Question: Consider the following code:
char *stringManipulator(char *src) {
char localBuffer[50];
strcpy(localBuffer, src);
return localBuffer;
}
What's wrong with this function?
Answer: The function returns a pointer to a local variable (localBuffer
). Once the function returns, this local variable goes out of scope, and its memory might be overwritten, leading to undefined behavior when trying to access the returned pointer.
14. Question: Why is using string functions like strtok()
problematic in a real-time embedded system context?
Answer: The strtok()
function uses static storage to remember the string being tokenized and its current position. This makes it non-reentrant and not thread-safe. In a real-time embedded system where multiple tasks or interrupts might be using such functions, this can lead to unpredictable behaviors.
15. Question: If you want to store a string that represents a date in the format YYYY-MM-DD
in an embedded system, how many characters should you allocate, including the null terminator?
Answer: You should allocate 11 characters. There are 4 for the year, 2 for the month, 2 for the day, 2 for the hyphens (-
), and 1 for the null terminator.
16. Question: What's the potential issue with the following code in an embedded system with limited stack size?
void myFunction() {
char largeString[2048];
// ... other operations
}
Answer: The code declares a large local array (largeString
) on the stack. In embedded systems with limited stack size, this can lead to stack overflow, especially if there are deeper call hierarchies or other local variables consuming stack space.
17. Question: Consider this function:
void printString(char str[100]) {
printf("%s", str);
}
Can you pass a string of length greater than 100 to this function?
Answer: Yes, you can. In C, array parameters in functions decay to pointers. This means the function actually receives a pointer to char, not a fixed-size array. So, while the function signature suggests a size, it doesn't enforce it, and longer strings can be passed, risking buffer overflows or other undefined behaviors if not handled properly.
18. Question: How can you prevent buffer overflows when reading strings from an external source in an embedded environment?
Answer: Use bounded functions like fgets()
or strncpy()
that take the size of the destination buffer as an argument. Always ensure null termination and be wary of potential truncation. Also, validate the source and length of input data before processing.
19. Question: How can string pooling benefit embedded systems?
Answer: String pooling consolidates duplicate string literals into a single instance in memory. This can save memory in embedded systems where multiple instances of the same string literal might be used. The linker or compiler typically handles string pooling.
20. Question: What is a zero-cost abstraction in the context of string operations, especially in embedded systems?
Answer: A zero-cost abstraction means that higher-level functionalities or abstractions don't impose additional runtime overhead compared to writing the equivalent functionality in low-level code. In the context of strings in embedded systems, it implies using string manipulations or operations that don't consume extra memory or computational resources beyond what's strictly necessary.
21. Question: Consider the following code:
char *getName() {
char name[20];
strncpy(name, "John Doe", sizeof(name) - 1);
name[sizeof(name) - 1] = '\0';
return name;
}
What's the issue with this function?
Answer:
The function is returning a pointer to a local variable (name
). This local variable goes out of scope after the function returns, which leads to undefined behavior when trying to access the returned pointer.
22. Question: What will be the output of this code?
char str[] = "embedded";
printf("%c", *(str + 3));
Answer:
The output will be b
. This code demonstrates pointer arithmetic with string arrays.
23. Question:
In the following code, what will the value of ptr
be after the loop?
char *ptr = "programming";
while (*ptr) ptr++;
Answer:
ptr
will point to the null terminator of the string "programming".
24. Question: Why might the following code be problematic in an embedded system?
char buffer[50];
sprintf(buffer, "%s", "This is a very long string that might exceed the buffer length.");
Answer: There's a risk of buffer overflow because the source string length exceeds the size of the destination buffer.
25. Question: Consider the following code:
char str1[] = "hello";
char str2[] = "world";
strcat(str1, str2);
What's wrong with this code?
Answer:
The buffer str1
may not have enough space to hold the concatenated string, leading to a buffer overflow.
26. Question:
What is the value of str[5]
in the code below?
char str[10] = "embedded";
Answer:
The value will be d
.
27. Question: What will the following code print?
char *str1 = "embedded";
char *str2 = "embedded";
if (str1 == str2) {
printf("Same");
} else {
printf("Different");
}
Answer: The output will likely be "Same". This is due to string pooling in many compilers which make both pointers point to the same memory location. However, this behavior is compiler-dependent.
28. Question: What is the potential issue with this function?
void copyStr(char *dest, const char *src) {
while (*src) {
*dest++ = *src++;
}
*dest = '\0';
}
Answer:
The function doesn't check the size of the dest
buffer, so there's a risk of buffer overflow if src
is longer than the space allocated for dest
.
29. Question: What will this code print?
char str[] = "Hello\0World";
printf("%s", str);
Answer:
It will print "Hello". The \0
is a null terminator, which indicates the end of the string.
30. Question: Why might this code be risky in terms of memory?
char *reverse(const char *str) {
int len = strlen(str);
char rev[len + 1];
for (int i = 0; i < len; i++) {
rev[len - i - 1] = str[i];
}
rev[len] = '\0';
return rev;
}
Answer:
The function returns a pointer to a local array rev
. This memory will be out of scope once the function returns, leading to undefined behavior when trying to access it.
31. Question: Consider the following code:
char *str = "Hello World";
str[4] = 'y';
What's the issue?
Answer: The string literal "Hello World" is stored in a read-only section of memory. Trying to modify it results in undefined behavior.
32. Question: Given the following code:
char name[5];
strcpy(name, "Johnny");
What's the issue?
Answer:
Buffer overflow. The strcpy
function doesn't check buffer sizes and the source string "Johnny" (with the null terminator) requires 7 bytes, but name
only provides 5 bytes.
33. Question: In the code snippet below:
char *getString() {
char localStr[10];
strcpy(localStr, "embedded");
return localStr;
}
What's the problem?
Answer:
The function returns a pointer to a local variable. Once the function exits, localStr
goes out of scope, making the returned pointer invalid.
34. Question: Consider the code:
char *str1 = "hello";
char str2[10];
str2 = str1;
What's wrong here?
Answer:
Arrays cannot be assigned to using the =
operator. The correct approach would be to use strcpy()
to copy the contents of str1
into str2
.
35. Question: In the following code:
char dest[10];
char src[] = "This string is too long!";
strncpy(dest, src, sizeof(dest));
What's the oversight?
Answer:
While strncpy
does limit the number of characters copied based on sizeof(dest)
, it doesn't guarantee a null-terminated string. If src
is longer than dest
, the result will not be null-terminated, potentially leading to undefined behavior in subsequent operations.
36. Question: Given the code:
char str[] = "embedded";
char *ptr = str + 3;
printf("%s\n", ptr);
What will this code print?
Answer:
This will print edded
. This isn't an error in itself, but can be tricky for those not familiar with pointer arithmetic on strings.
37. Question: Consider the code:
char *str = malloc(10);
if (str) {
strcpy(str, "Hello");
// ... other code ...
}
free(str);
str[2] = 'x';
What's the mistake?
Answer:
Using a pointer (str
in this case) after it has been freed is undefined behavior. The operation str[2] = 'x';
after the free()
call is invalid.
38. Question: In this code snippet:
char buffer[10];
snprintf(buffer, 12, "%s", "Hello, World!");
What's the potential issue?
Answer:
The buffer size specified in snprintf
(12 in this case) exceeds the actual buffer size, which can lead to buffer overflows.
39. Question: Given the code:
char *src = "Hello World";
char dest[5];
strncpy(dest, src, 5);
What might be an unintended result?
Answer:
The destination string dest
will not be null-terminated since we're copying exactly 5 characters. This can lead to unexpected results when trying to use dest
as a string in subsequent operations.
40. Question: Consider the code:
char buffer[50];
fgets(buffer, sizeof(buffer), stdin);
strcat(buffer, " appended text");
What's the potential issue?
Answer: If the user input is close to or equal to the buffer size (50 characters), appending additional text will result in a buffer overflow.
41. Question: Given the following code:
char *name = "John";
name[2] = 'n';
What's the mistake?
Answer: The string literal "John" is stored in a read-only section of memory. Trying to modify it results in undefined behavior.
42. Question: Consider the code snippet:
char str[10];
sprintf(str, "%s", "This is a very long string");
What's the issue?
Answer:
Buffer overflow. The sprintf
function will write beyond the bounds of the str
array because the source string is longer than the available space.
43. Question: In the following code:
char *a = "hello";
char *b = "world";
strcat(a, b);
What's wrong?
Answer:
strcat
tries to concatenate b
onto a
, but both a
and b
are pointers to string literals in read-only memory. This operation is undefined and could cause a segmentation fault.
44. Question: Given this code:
char str[10];
strcpy(str, "embed");
strncat(str, "ded systems", 4);
What's the outcome?
Answer:
The strncat
will concatenate only the first 4 characters of "ded systems" onto str
, resulting in the string "embedded sys", but this will overflow str
.
45. Question: Look at the following:
char *str1 = "test";
char str2[] = "test";
str1 = str2;
Is there a problem?
Answer:
No, this is valid. It's setting the pointer str1
to point to the first character of array str2
. It's not modifying the original string literal "test" that str1
was pointing to.
46. Question: Given:
char str1[10], str2[10];
gets(str1);
strcpy(str2, str1);
What's the major concern?
Answer:
The gets
function is dangerous because it doesn't check for buffer overflows. If an input larger than 9 characters (plus the null terminator) is entered, it can overwrite adjacent memory.
47. Question: In this code:
char *s;
s = malloc(5);
if(s) {
strcpy(s, "longer than allocated");
}
What's the flaw?
Answer: There's a buffer overflow. The allocated memory is only enough for a string of 4 characters plus a null terminator, but a much longer string is copied into it.
48. Question: Consider:
char greeting[] = "Hello, world!";
greeting[13] = '?';
What's the mistake?
Answer:
The string "Hello, world!" is 13 characters long plus the null terminator. Trying to access greeting[13]
is accessing the null terminator, and modifying it would mean the string is no longer properly terminated.
49. Question: Given:
char str[10] = "hello";
char *p = &str[1];
p[4] = '!';
What's the outcome?
Answer:
This is valid, though a bit tricky. p
points to the 'e' in "hello", and p[4]
modifies the null terminator of str
to be '!'. So, str
will contain "hello!" without a proper null terminator.
50. Question: Examine:
char str[10];
str = "embedded";
What's wrong?
Answer:
Array names are not modifiable lvalues. We cannot assign to them using the =
operator after declaration.