My philosophies for data-scientific codes in python

  1. Never hardcode. Make good utilization of argparse and configuration yaml.
  2. Make good utilization of the setuptool. Explicitly specify the versions of the packages.
  3. Keep an eye on the Kolmogorov complexity. Never write the same code twice. Make a function out of multiply-used-codeblock and call it from different places. Remember this when augmenting others’ code and try to reutilize their code as much as possible.
  4. One function should do only one logical task. This is helpful when others need to use the functionalities.
  5. Never dynamically create variable/method/functions. If such cases must exist, they must be declared in the readme file with the proper justification on why it is needed.
  6. In general, I do not like codes with a lot of try-except blocks. If some exceptions can be avoided with a “look-before-you-leap” approach, I don’t see much harm in that. But, exceptions are NEVER to be silently passed.
  7. Try not to return more than one variable from a function. If necessary, use a namedtuple or dictionary.
  8. Functions should be defined with a default value to their inputs whenever possible.
  9. Never use wildcard import (from … import *). This type of code is difficult to maintain.
  10. Writing unit tests for each object/function is usually overkill. Find a few use-cases that can cover 100% of the code and use those as tests.
  11. Avoid sticky code. Sticky codes are the ones that break when moved from one place to another. Usually, they have dependencies on global variables coming outside of the declared interfaces. So, always communicate among objects through established interfaces and hide the details of implementations.
  12. Every class definition and its methods must accompany a docstring
  13. I prefer functional programming over object-oriented ones. However, object-oriented programming can coexist with functional programming.
  14. Create classes only when it is necessary (Typically to persist the contexts/states, to avoid sticky codes, or to establish natural relationships). 
  15. Deep learning, in general, is a streamed optimization process. So I prefer to write deep learning code as a stream of data that is processed through a sequence of generators.
  16. Any mathematical/code-ninja trick should accompany a short, no-more-than-one-line comment explaining what is intended. This improves readability a lot.
  17. Try to use PEP-8 as long as it is not in direct opposition to the above points.

I’ve found the following book as a great primer and reference for some cool design patterns. I try to adopt them as appropriate for python:
Head First Design Patterns: A Brain-Friendly Guide, by Eric Freeman and Elisabeth Robson

Leave a Reply

Your email address will not be published. Required fields are marked *