Lecture 2: Object Persistence and Serialization, Iterator Invalidation , First-Class and Second-Class Concepts, Roles of Classes in C++
Date: 2023-08-01
1. Object Persistence and Serialization
1.1 Object Persistence
Object persistence is the ability to save the state of an object between program executions.
1.2 Serialization
Serialization helps persist objects by converting them into a stream of bytes that can be saved to a file or sent over a network. Deserialization is the reverse process of converting a stream of bytes back into an object.
a. Ways to Serialize Objects
-
Binary Techniques:
- Protocol Buffers, CapnProto, ASN.1, BSON, etc.
- Advantages:
- Fast
- Compact
- Easy to use
- Disadvantages:
- Not human readable
- Not language independent
- Source control issues
-
Text/Human-Readable Techniques:
- JSON, XML, YAML, etc. (formats like CSV are not heirarchical and less useful for complex data)
- Advantages:
- Human readable
- Language independent
- Source control friendly
- Disadvantages:
- Slow
- Large
- Hard to use
- Requires parsing
Both need to detect/handle corruption.
b. Two Serialization Strategies
-
Embedded:
-
What is it: Objects themselves handle both serialization and deserialization. This means that the object's data and the logic for converting it to and from a serialized format are bundled together.
-
Pros:
- Self-contained: Objects know how to serialize/deserialize themselves, making the process straightforward.
- High cohesion: Keeps data and logic for serialization closely tied.
-
Cons:
- Lack of flexibility: Changes to the serialization format might require changes to the object itself.
- Potential for redundancy: Metadata or logic for serialization can be duplicated across similar objects.
-
Examples: Python's
pickle
module. -
Commonality: Often found in languages or systems where ease-of-use and simplicity are a priority.
-
-
External:
-
What is it: A separate entity is responsible for taking an object's state and turning it into a serialized form, as well as taking serialized data and restoring it to an object's state. The object itself is not responsible for understanding the serialization format.
-
Pros:
- Flexibility: The object and its serialized form are decoupled, allowing each to change independently.
- Efficiency: Metadata or serialization logic can be centralized, avoiding redundancy.
-
Cons:
- Complexity: Requires managing both the object and its external serialization logic.
- Potential for mismatch: If the external entity's logic is not kept in sync with the object, errors can occur.
-
Examples: Protocol Buffers with separate
.proto
files for schema definitions. -
Commonality: Common in systems where performance and flexibility are priorities, such as databases and networking protocols.
-
2. Iterator Invalidation in C++: What You Need to Know
Iterator invalidation is a common pitfall in C++ programming that can result in undefined behavior or crashes. This happens when you modify a container in a way that the existing iterators point to elements that no longer exist or have moved to a different memory location. Understanding how each container type behaves is crucial for writing robust code.
2.1 Vector Example
#include <vector>
#include <iostream>
int main() {
std::vector<int> nums = {1, 2, 3, 4, 5};
auto iter = nums.begin() + 2; // points to '3'
nums.push_back(6); // might invalidate 'iter'
// Undefined behavior if 'iter' was invalidated
std::cout << *iter << std::endl;
return 0;
}
In this example, adding an element to the end of the vector may invalidate the iterator iter
if the operation results in a reallocation of the underlying array.
2.2 List Example
#include <list>
#include <iostream>
int main() {
std::list<int> nums = {1, 2, 3, 4, 5};
auto iter = nums.begin();
std::advance(iter, 2); // points to '3'
nums.push_back(6); // 'iter' is still valid
nums.remove(3); // 'iter' is invalidated only if it points to the removed element
// Safe to dereference if 'iter' points to a valid element
std::cout << *iter << std::endl;
return 0;
}
With std::list
, most insertions and deletions won't invalidate iterators, unless you are deleting the element an iterator is pointing to.
2.3 Map Example
#include <map>
#include <iostream>
int main() {
std::map<int, std::string> students = {{1, "Alice"}, {2, "Bob"}, {3, "Carol"}};
auto iter = students.find(2); // points to {2, "Bob"}
students[4] = "Dave"; // 'iter' is still valid
students.erase(2); // 'iter' is invalidated
// Undefined behavior if 'iter' was invalidated
std::cout << iter->second << std::endl;
return 0;
}
In std::map
, iterators are typically invalidated only when the corresponding key-value pair is removed.
3. Understanding First-Class and Second-Class Concepts in Programming
3.1 What Are They?
- First-Class Concepts: These are ideas or entities in your program that are fully and clearly represented, often having their own dedicated classes or data structures.
- Second-Class Concepts: These are ideas that are important but aren't explicitly defined or only haphazardly represented in your code.
3.2 Why It Matters
When you don't give an important concept its proper representation, it makes the code harder to understand, manage, and scale. Think of it like organizing a kitchen: if you've got a dedicated drawer for utensils, it's easy to find what you need. But if you just toss utensils in random drawers each time, cooking becomes a chaotic, time-consuming experience.
3.3 Examples
Bad Practice: Second-Class Concept
Let's say you're developing a weather application. You frequently use the concept of 'Temperature' but only as a simple float.
def get_daily_average_temps():
return [75.3, 60.5, 84.2]
Here, 'Temperature' is a second-class concept. If another developer or even you, later on, need to consider temperatures in Celsius, or need to attach a timestamp to each temperature reading, the code will quickly become messy and harder to manage.
Good Practice: First-Class Concept
Instead, you could create a Temperature
class to encapsulate everything that's relevant to the concept of temperature.
class Temperature:
def __init__(self, value, unit='F'):
self.value = value
self.unit = unit
def get_daily_average_temps():
return [Temperature(75.3), Temperature(60.5), Temperature(84.2)]
Now, 'Temperature' is a first-class concept. You can easily extend the Temperature
class to convert between Fahrenheit and Celsius, attach timestamps, or add any other associated functionality.
Absolutely, let's delve into this intriguing part of C++ programming.
4. Understanding Roles of Classes in C++
4.1 Introduction
In C++ programming, classes often serve different roles within a system. Recognizing these roles helps in both reading and writing effective, maintainable code. In chapter 24 of The C++ Programming Language, five common roles are outlined: Interface, Abstract, Concrete, Operation, and Handle.
4.2 Naming Conventions
A role-based type naming syntax can be used to quickly identify the role of a class. This involves prefixing the class name with a character that signifies its role:
- I: Stands for Interface (e.g.,
ICar
) - A: Stands for Abstract (e.g.,
ACar
) - C: Stands for Concrete (e.g.,
CCivic
) - O: Stands for Operation (e.g.,
OCompare
) - H: Stands for Handle (e.g.,
HStudent
)
4.3 Role Descriptions
I: Interface Classes (ICar
)
An interface class is a pure abstract (all functions are pure virtual) class that defines the contract other classes must adhere to. Interface classes typically have no data members and only declare method signatures.
class ICar {
public:
virtual void drive() = 0;
virtual void stop() = 0;
};
A: Abstract Classes (ACar
)
Abstract classes are similar to interface classes but may contain some basic implementation. However, since they have at least one pure virtual function, they cannot be instantiated directly.
class ACar : public ICar {
public:
void drive() override {
// Some basic driving logic
}
virtual void stop() = 0;
};
C: Concrete Classes (CCivic
)
Concrete classes are complete implementations and can be instantiated. They often inherit from one or more abstract or interface classes and implement all required methods.
class CCivic : public ACar {
public:
void stop() override {
// Concrete stop logic
}
};
O: Operation Classes (OCompare
)
These classes primarily define a set of operations or behaviors. They usually overload operators, like the function call operator.
class OCompare {
public:
bool operator()(int a, int b) {
return a < b;
}
};
H: Handle Classes (HStudent
)
A handle class is a bit more nuanced. It's designed to manage a resource, often providing a level of abstraction over that resource. Handle classes are commonly used for resource management tasks like memory management, file handling, and more. These classes encapsulate the actual resource and provide an API to interact with it.
For instance, in the context of smart pointers like std::shared_ptr
or std::unique_ptr
, the smart pointer acts as a handle to manage the memory of the pointee.
Let's take another example:
class HStudent {
private:
Student* student; // Resource
public:
HStudent(Student* s) : student(s) {} // Constructor
~HStudent() { delete student; } // Destructor to free resource
// Additional methods to interact with Student
void setName(const std::string& name) {
student->setName(name);
}
};
HStudent
acts as a handle to a Student
object. It takes care of deleting the Student
object when the HStudent
object is destroyed, thus managing the memory of the Student
resource.
5. A Note on Garbage Collection in C++11
Garbage collection (GC) is the automated process of reclaiming unused memory. While languages like C# have built-in GC, C++ is often criticized for lacking this feature, requiring programmers to manage memory manually.
5.1 The Downsides of Garbage Collection
However, garbage collection isn't a silver bullet. It primarily deals with memory but not other types of resources like file handles, threads, or locks. You still have to manage these manually. Plus, you can't always predict when the garbage collector will run, so it's not a guarantee that memory will be freed when you expect it to be.
5.2 Native Tools for Resource Management in C++
C++ offers a variety of standard library tools that make resource management easier. For example:
std::string
eliminates the need to deal with raw character arrays.- Container classes like
std::vector
manage memory for you. - Smart pointers such as
std::unique_ptr
andstd::shared_ptr
help prevent memory leaks.
These tools help you manage resources efficiently, reducing the need for a garbage collector.
5.3 Legacy Code and Optional Garbage Collection
If you find yourself dealing with a lot of legacy C++ code that uses raw pointers and could have memory leaks, there are garbage collection implementations available for C++ as well. However, they are not widely adopted due to lack of language standard support. C++11 begins to address this by defining an Application Binary Interface (ABI) and setting some guidelines for garbage collectors.
5.4 C++ Philosophy on Garbage Collection
The C++ approach is more about preventing resource leaks in the first place rather than cleaning up afterward. Hence, garbage collection in C++ is entirely optional and must be explicitly activated.