Lecture 2: Object Persistence and Serialization, Iterator Invalidation , First-Class and Second-Class Concepts, Roles of Classes in C++

Date: 2023-08-01

1. Object Persistence and Serialization

1.1 Object Persistence

Object persistence is the ability to save the state of an object between program executions.

1.2 Serialization

Serialization helps persist objects by converting them into a stream of bytes that can be saved to a file or sent over a network. Deserialization is the reverse process of converting a stream of bytes back into an object.

a. Ways to Serialize Objects

  • Binary Techniques:

    • Protocol Buffers, CapnProto, ASN.1, BSON, etc.
    • Advantages:
      • Fast
      • Compact
      • Easy to use
    • Disadvantages:
      • Not human readable
      • Not language independent
      • Source control issues
  • Text/Human-Readable Techniques:

    • JSON, XML, YAML, etc. (formats like CSV are not heirarchical and less useful for complex data)
    • Advantages:
      • Human readable
      • Language independent
      • Source control friendly
    • Disadvantages:
      • Slow
      • Large
      • Hard to use
      • Requires parsing

Both need to detect/handle corruption.

b. Two Serialization Strategies

  • Embedded:

    • What is it: Objects themselves handle both serialization and deserialization. This means that the object's data and the logic for converting it to and from a serialized format are bundled together.

    • Pros:

      • Self-contained: Objects know how to serialize/deserialize themselves, making the process straightforward.
      • High cohesion: Keeps data and logic for serialization closely tied.
    • Cons:

      • Lack of flexibility: Changes to the serialization format might require changes to the object itself.
      • Potential for redundancy: Metadata or logic for serialization can be duplicated across similar objects.
    • Examples: Python's pickle module.

    • Commonality: Often found in languages or systems where ease-of-use and simplicity are a priority.

  • External:

    • What is it: A separate entity is responsible for taking an object's state and turning it into a serialized form, as well as taking serialized data and restoring it to an object's state. The object itself is not responsible for understanding the serialization format.

    • Pros:

      • Flexibility: The object and its serialized form are decoupled, allowing each to change independently.
      • Efficiency: Metadata or serialization logic can be centralized, avoiding redundancy.
    • Cons:

      • Complexity: Requires managing both the object and its external serialization logic.
      • Potential for mismatch: If the external entity's logic is not kept in sync with the object, errors can occur.
    • Examples: Protocol Buffers with separate .proto files for schema definitions.

    • Commonality: Common in systems where performance and flexibility are priorities, such as databases and networking protocols.


2. Iterator Invalidation in C++: What You Need to Know

Iterator invalidation is a common pitfall in C++ programming that can result in undefined behavior or crashes. This happens when you modify a container in a way that the existing iterators point to elements that no longer exist or have moved to a different memory location. Understanding how each container type behaves is crucial for writing robust code.

2.1 Vector Example

#include <vector>
#include <iostream>

int main() {
    std::vector<int> nums = {1, 2, 3, 4, 5};
    auto iter = nums.begin() + 2; // points to '3'
    nums.push_back(6); // might invalidate 'iter'

    // Undefined behavior if 'iter' was invalidated
    std::cout << *iter << std::endl; 

    return 0;
}

In this example, adding an element to the end of the vector may invalidate the iterator iter if the operation results in a reallocation of the underlying array.

2.2 List Example

#include <list>
#include <iostream>

int main() {
    std::list<int> nums = {1, 2, 3, 4, 5};
    auto iter = nums.begin();
    std::advance(iter, 2); // points to '3'

    nums.push_back(6); // 'iter' is still valid
    nums.remove(3); // 'iter' is invalidated only if it points to the removed element

    // Safe to dereference if 'iter' points to a valid element
    std::cout << *iter << std::endl; 

    return 0;
}

With std::list, most insertions and deletions won't invalidate iterators, unless you are deleting the element an iterator is pointing to.

2.3 Map Example

#include <map>
#include <iostream>

int main() {
    std::map<int, std::string> students = {{1, "Alice"}, {2, "Bob"}, {3, "Carol"}};
    auto iter = students.find(2); // points to {2, "Bob"}

    students[4] = "Dave";  // 'iter' is still valid
    students.erase(2); // 'iter' is invalidated

    // Undefined behavior if 'iter' was invalidated
    std::cout << iter->second << std::endl; 

    return 0;
}

In std::map, iterators are typically invalidated only when the corresponding key-value pair is removed.


3. Understanding First-Class and Second-Class Concepts in Programming

3.1 What Are They?

  • First-Class Concepts: These are ideas or entities in your program that are fully and clearly represented, often having their own dedicated classes or data structures.
  • Second-Class Concepts: These are ideas that are important but aren't explicitly defined or only haphazardly represented in your code.

3.2 Why It Matters

When you don't give an important concept its proper representation, it makes the code harder to understand, manage, and scale. Think of it like organizing a kitchen: if you've got a dedicated drawer for utensils, it's easy to find what you need. But if you just toss utensils in random drawers each time, cooking becomes a chaotic, time-consuming experience.

3.3 Examples

Bad Practice: Second-Class Concept

Let's say you're developing a weather application. You frequently use the concept of 'Temperature' but only as a simple float.

def get_daily_average_temps():
    return [75.3, 60.5, 84.2]

Here, 'Temperature' is a second-class concept. If another developer or even you, later on, need to consider temperatures in Celsius, or need to attach a timestamp to each temperature reading, the code will quickly become messy and harder to manage.

Good Practice: First-Class Concept

Instead, you could create a Temperature class to encapsulate everything that's relevant to the concept of temperature.

class Temperature:
    def __init__(self, value, unit='F'):
        self.value = value
        self.unit = unit

def get_daily_average_temps():
    return [Temperature(75.3), Temperature(60.5), Temperature(84.2)]

Now, 'Temperature' is a first-class concept. You can easily extend the Temperature class to convert between Fahrenheit and Celsius, attach timestamps, or add any other associated functionality.


Absolutely, let's delve into this intriguing part of C++ programming.


4. Understanding Roles of Classes in C++

4.1 Introduction

In C++ programming, classes often serve different roles within a system. Recognizing these roles helps in both reading and writing effective, maintainable code. In chapter 24 of The C++ Programming Language, five common roles are outlined: Interface, Abstract, Concrete, Operation, and Handle.

4.2 Naming Conventions

A role-based type naming syntax can be used to quickly identify the role of a class. This involves prefixing the class name with a character that signifies its role:

  • I: Stands for Interface (e.g., ICar)
  • A: Stands for Abstract (e.g., ACar)
  • C: Stands for Concrete (e.g., CCivic)
  • O: Stands for Operation (e.g., OCompare)
  • H: Stands for Handle (e.g., HStudent)

4.3 Role Descriptions

I: Interface Classes (ICar)

An interface class is a pure abstract (all functions are pure virtual) class that defines the contract other classes must adhere to. Interface classes typically have no data members and only declare method signatures.

class ICar {
public:
    virtual void drive() = 0;
    virtual void stop() = 0;
};

A: Abstract Classes (ACar)

Abstract classes are similar to interface classes but may contain some basic implementation. However, since they have at least one pure virtual function, they cannot be instantiated directly.

class ACar : public ICar {
public:
    void drive() override {
        // Some basic driving logic
    }
    virtual void stop() = 0;
};

C: Concrete Classes (CCivic)

Concrete classes are complete implementations and can be instantiated. They often inherit from one or more abstract or interface classes and implement all required methods.

class CCivic : public ACar {
public:
    void stop() override {
        // Concrete stop logic
    }
};

O: Operation Classes (OCompare)

These classes primarily define a set of operations or behaviors. They usually overload operators, like the function call operator.

class OCompare {
public:
    bool operator()(int a, int b) {
        return a < b;
    }
};

H: Handle Classes (HStudent)

A handle class is a bit more nuanced. It's designed to manage a resource, often providing a level of abstraction over that resource. Handle classes are commonly used for resource management tasks like memory management, file handling, and more. These classes encapsulate the actual resource and provide an API to interact with it.

For instance, in the context of smart pointers like std::shared_ptr or std::unique_ptr, the smart pointer acts as a handle to manage the memory of the pointee.

Let's take another example:

class HStudent {
private:
    Student* student; // Resource
public:
    HStudent(Student* s) : student(s) {} // Constructor
    ~HStudent() { delete student; } // Destructor to free resource

    // Additional methods to interact with Student
    void setName(const std::string& name) {
        student->setName(name);
    }
};

HStudent acts as a handle to a Student object. It takes care of deleting the Student object when the HStudent object is destroyed, thus managing the memory of the Student resource.


5. A Note on Garbage Collection in C++11

Garbage collection (GC) is the automated process of reclaiming unused memory. While languages like C# have built-in GC, C++ is often criticized for lacking this feature, requiring programmers to manage memory manually.

5.1 The Downsides of Garbage Collection

However, garbage collection isn't a silver bullet. It primarily deals with memory but not other types of resources like file handles, threads, or locks. You still have to manage these manually. Plus, you can't always predict when the garbage collector will run, so it's not a guarantee that memory will be freed when you expect it to be.

5.2 Native Tools for Resource Management in C++

C++ offers a variety of standard library tools that make resource management easier. For example:

  • std::string eliminates the need to deal with raw character arrays.
  • Container classes like std::vector manage memory for you.
  • Smart pointers such as std::unique_ptr and std::shared_ptr help prevent memory leaks.

These tools help you manage resources efficiently, reducing the need for a garbage collector.

5.3 Legacy Code and Optional Garbage Collection

If you find yourself dealing with a lot of legacy C++ code that uses raw pointers and could have memory leaks, there are garbage collection implementations available for C++ as well. However, they are not widely adopted due to lack of language standard support. C++11 begins to address this by defining an Application Binary Interface (ABI) and setting some guidelines for garbage collectors.

5.4 C++ Philosophy on Garbage Collection

The C++ approach is more about preventing resource leaks in the first place rather than cleaning up afterward. Hence, garbage collection in C++ is entirely optional and must be explicitly activated.