I dedicate this post to Insight Data Engineering. It’s a great program that brought me to California one year ago this week.
I’ve missed several months of blogging because of a new job, buying a condo, and finishing the move across the country. In this simple blog, I’ll share how I like to set up a Python project. There will be no holy wars on editors or my choice to focus on Python. Rather, this blog is the missing blog I was looking for when I was learning the best practices with software engineering in Python. Specifically, this blog walks through how I would set up a Python project as of mid-2015 with tools including virtualenv, tox, make, pytest, mock, and a basic code structure.
Virtualenv
Virtual environments are essential for developing in Python when non-native Python libraries are needed, like a MySQL client. Run the following command at the top-level of your repo to create a directory called virtualenv_run
with the Python 2.7 interpreter inside.
virtualenv virtualenv_run --python=python2.7
To enter the new virtualenv, execute source virtualenv_run/bin/activate
. I execute that command so often, I have a bash alias src
for this and name all my virtualenvs the same. While inside the virtualenv, you can now install packages with pip and they will be installed in the virtualenv_run directory and not in a directory owned by root (so stop using sudo pip install
some_package).
Next, three files need to be added at the top level of the repo: setup.py, requirements.txt, and requirements-dev.txt. These files specify metadata about your repo/project and what dependencies you need. The format of setup.py is shown below. Other arguments including description, author, author_email, and url can also be included.
from setuptools import find_packages from setuptools import setup setup( name="MyProject", version="0.1", packages=find_packages(exclude=['tests']), setup_requires=['setuptools'], install_requires=[ "MySQL-python" ], )
In this example, the package MySQL-python
is listed as a dependency with install_requires
kwarg without a pinned down version. This will install the latest version of the package that is available. It is a common practice to include dependencies in setup.py (over requirements.txt) as much as possible if your project is a library. If your project is a standalone application, service, or something with a main, it is more common to include dependencies with pinned version numbers in requirements.txt. See this blog for a more detailed explanation.
The requirements-dev.txt (shown below) is designed to include the tools needed for testing. I also include a link to install the requirements.txt file at the top. Later, I’ll show uses for each package in my adopted workflow.
-r requirements.txt coverage ipdb ipython flake8 mock pytest
Below is an example requirements.txt with the dependency MySQL-python
package pinned down. If you have installed packages via pip to your project, you can do pip freeze
to get the version numbers of installed packages. It may easier to pip install a package and then get the version number with pip freeze for making sure an updated dependency doesn’t break your project when new versions are available. Lastly, the -e .
installs the package at the current directory (your project) into the virtualenv.
-e . MySQL-python==1.2.5
To build a virtualenv with these files simply execute pip install -r requirements-dev.txt
.
Tox
The last component before starting development is to set up tox, a virtualenv-based test automation tool. Tox will run all of your unit tests, check for code style issues, code test coverage, and even kick off acceptance/integration tests. I’ll cover the first three uses in this blog. Below is an example tox.ini file.
[tox] envlist = py27 [testenv] deps = -rrequirements-dev.txt commands = coverage run --source=<YOUR_PROJECT>/,tests/ -m pytest --strict {posargs:tests} flake8 . [flake8] ignore = E125 max-line-length = 120 max-complexity = 10 exclude = .git,.tox,virtualenv_run
With this tox configuration file, the command tox
will know how to run your suite of tests. It automatically build a clean virtualenv in a directory called .tox
. It runs all unit tests in the tests
directory. It reports the code coverage with the coverage
. Always shoot for 100% code coverage with your tests. Certain lines like running main can be excluded with a .coveragerc
file. If main is excluded from coverage, I’d strongly suggest main consisting of only an object creation and a single call to a method like run
.
Tox also checks your code for conformance to proper Python coding style with flake8. By default flake8 is very strict with how whitespace is managed. Feel free to add to the list of ignore
s with the errors that flake8 provides. Before writing code, create a simple makefile with make-venv
creating the virtualenv for development and make-test
for executing tox.
Directory Structure
Finally, let’s write some code! I’ll share my opinions about high-level structure of your repo. First, create a tests directory for all tests. It is not by chance or accident that I mention tests first; the test code is as important as the production code! Second, create a directory for all of your production code to live. This name can even be the same as the repo name if no other name seems more appealing.
The remaining parts of the directory structure requires a brief discussion about software engineering. Think about separating the functionalities needed for your use cases. You’ll probably have some core objects and some object containers along with logic for manipulations of those objects or containers. You’ll also have some end delivery system for your use case like handling web requests, document creation, or database inserts.
The most important thing is to separate the core objects/logic from the delivery system because it makes switching delivery systems easier, and it makes managing the code easier. At the very least create a directory called components as a sub-directory of your production code directory, and the other sub-directory can be named after the delivery system (i.e., webserver). Also, the directory structure of the tests directory should match the adopted directory structure of the production code, so your tests are easier to find.
Pytest
The last thing, I’d like to cover is how to use a test framework in Python. I’ll share some examples using pytest and mock as well as quick pointers leveraging breakpoints in iPython. Test files can be named test_module_to_be_tested.py in the directory that mirrors the module directory in the production code. I opt for classes for my tests instead of individual functions in a module. Test classes are named starting with “Test” and test methods are prepended with “test_”.
Inside each test class there are typically parameters that can be hardcoded for setting up or interacting with the thing being tested. I put these constants in pytest fixtures, which are objects that are constructed once for each test method that includes them as an argument. Below is an example use of fixtures with the assumption that this is in a test class and you are testing the Person
class. Lastly, assume that name
is a property of Person
.
@pytest.fixture def some_name(self): return "Bob" @pytest.fixture def some_person(self, some_name): return Person(name=some_name) def test_person_has_right_name(self, some_person, some_name): assert some_person.name == some_name
I strive for putting any hardcoded strings or integers at the top of each test class. This makes refactoring the test code or production code easier. In the above example, if we wanted to change the name property to be a namedtuple or name object for additional features, we would only need to change the some_name
fixture!
The next essential testing concept is mocking. Use mocking for simulating external functionalities like writes to a database, posts to a client for a web response, or simply calls to a logger. Mocks can also be used to isolate functionalities of a class as will be shown below. Also, it’s a good idea to have mocks inside of contexts using a combination of with
or contextlib.nested
and maybe a @pytest.yield_fixture
if the mock can be re-used.
Here is an example where raw result processing will be tested (but not raw result generation/gathering), so we will mock out the retrieval of raw results to isolate what is being tested. Assume that DataProcessor
is the class being tested and get_sum_of_results()
calls get_raw_results()
enroute to summing the raw results.
@pytest.fixture def data_processor(self): return DataProcessor() @pytest.fixture def fake_results(self): return [0, 2, 4, 6, 10] @pytest.yield_fixture def raw_results_patch( self, data_processor, fake_results ): with mock.patch.object( data_processor, 'get_raw_results', return_value=fake_results ) as mock_get_raw_results: yield mock_get_raw_results def test_data_processor_sums_correctly( self, data_processor, raw_results_patch, fake_results ): summed_results = data_processor.get_sum_of_results() assert summed_results == sum(fake_results)
This is a nice way to mock out the raw results to test summing independently of result retrieval, and if the raw_results_patch
yield fixture is not included as an argument to another test, the test will not mock out the get_raw_results
call. See mock examples to see more examples on mocking, and pytest documentation for more on pytest.
Sometimes, I like to step into the code at a breakpoint using iPython to explore some language semantics, mocking, or debugging why my test isn’t passing. To do so using the setup endorsed in this blog, I use the ipdb
package and invoke the test run without using tox. Specifically, for the spot in the code I want investigate, I paste in import ipdb;ipdb.set_trace()
. Then, I run the test with this invocation python -m pytest -s tests/components/test_a_component.py
.
There is a lot of stuff in this post. I hope you enjoyed it and found it helpful! Feel free to share feedback, and thanks for reading!