Friday, August 28, 2015

Different approaches to dump and load objects using Python. Perfomance analysis.


The previous post gave you an example demonstrating the usage of json format to store an object's value. 

We have several ways to store/load (or serialize and deserialize) data that we collect during the launch of our application. 
At the moment we have several ways how to do that. First of all we should think about what we will do with that data? Will we load it back inside our application? Will we send it to another application (that could be written using another programming language, not especially Python)? Should we store data in the format that could be understood by a human, so that the stored data had a readable format?
What approaches might we have if we used Python?
  1. JSON (http://json.org/)
  2. XML (http://www.w3.org/XML/
  3. YAML (http://yaml.org/ and RFC is here)
  4. Pickle (Python 2 Docs, Python 3 Docs)
JSON, XML and YAML can be used to store information in human-readable format, and this data format can be uploaded using all modern programming languages. Pickle is a Python-specific module. An advantage of this module is that we can dump more complicated objects than standard data structures and types without any issues and special methods inside our classes that will convert instances of our classes into the format that is applicable to be dumped into XML, JSON or YAML formats.
So, it is the time to look at each bullet in a bit more detail.

First of all we should keep in mind that for serialization and deserialization we should have simple approach that allows us to store and load our data (simple variables, collections, objects) without any problems. We will have a look at implementations of modules with both dump and load functions.

JSON 

This format is supported by Python standard module (module description for Python 3.X). As you may see from documentation, the functionality provided in this module is pretty simple. It allows to serialize and deserialize an object into a string or a stream. The list of the possible types that can be encoded/decoded with this value includes: Several collection types: dict, list and tuple, String types like string and unicode string, Numeric types like int, long and float, Boolean values such as True and False, and of course None.
This library provides both dump function and load function. So, JSON is chosen for our experiment.

XML

XML support in Python is implemented in several modules. Full information can be found here. As I haven't found any applicable modules with the required functionality, I suggest that for our experiment we try to write our own module for serialization or deserialization, based on Expat module. But it is not the purpose of this article, so we will not use XML for this experiment.

YAML

Module PyYAML can be used for working with this markup language. It provides simple functions to dump and load information. Correlation with JSON is available here and with XML - here. This module has both dump and load functions, so we will use it for our experiment.

Pickle

Pickle is a standard module that allows to use serialization and deserialization processes for an object. It converts objects hierarchy in byte stream aka 'Pickling' operation and restore the objects hierarchy from byte stream aka 'Unpickling' process. This format is compared with JSON in the following article for Python 3.X
Since Pickle is a standard module, it also contains both dump function and load function, and we will use this module as part of our experiment.

Tuesday, August 4, 2015

Python unittest. A bit more practice after theory. Basics and a bit more than basics.

The main purpose of this article is just to share my experience on using unittest module in Python. 

I chose version 2.7.10 for this case, but I think we may expect the same behavior with Python 3.X
Documentation about unittest module available via following link: https://docs.python.org/2/library/unittest.html, the site interface with documentation is easy and you can simply switch to the same topic, but another version of Python.
In this article I will demonstrate my source code snippets with links for GitHub (where you can find full version)
Initially I've prepared a special class with functionality that we will be using to demonstrate unittest usage.
The class can be used for the following:
  1. To generate a dictionary with random keys and values
  2. To store generated dictionary like a JSON object into a file (by using standard library in Python)
  3. To upload from the file the JSON object back to the dictionary
Latest version of the class could be found here (GitHub)
Now let's start with preparation of simple unit tests lists. You can find basic information about the unit test on Wikipedia. It provides enough knowledge to start working with that.

Let's switch to practice.