My philosophies for data-scientific codes in python

  1. Never hardcode. Make good utilization of argparse and configuration yaml.
  2. Make good utilization of the setuptool. Explicitly specify the versions of the packages.
  3. Keep an eye on the Kolmogorov complexity. Never write the same code twice. Make a function out of multiply-used-codeblock and call it from different places. Remember this when augmenting others’ code and try to reutilize their code as much as possible.
  4. One function should do only one logical task. This is helpful when others need to use the functionalities.
  5. Never dynamically create variable/method/functions. If such cases must exist, they must be declared in the readme file with the proper justification on why it is needed.
  6. In general, I do not like codes with a lot of try-except blocks. If some exceptions can be avoided with a “look-before-you-leap” approach, I don’t see much harm in that. But, exceptions are NEVER to be silently passed.
  7. Try not to return more than one variable from a function. If necessary, use a namedtuple or dictionary.
  8. Functions should be defined with a default value to their inputs whenever possible.
  9. Never use wildcard import (from … import *). This type of code is difficult to maintain.
  10. Writing unit tests for each object/function is usually overkill. Find a few use-cases that can cover 100% of the code and use those as tests.
  11. Avoid sticky code. Sticky codes are the ones that break when moved from one place to another. Usually, they have dependencies on global variables coming outside of the declared interfaces. So, always communicate among objects through established interfaces.
  12. Every class must accompany a docstring containing the following information: a) What it is supposed to do, b) Any implicit but important assumption (e.g. incoming data must contain this; that service must be on), c) Any important note (e.g. do not call concurrently), d) Definitions of uninitialized variables in the __init__
  13. Every class method must accompany a docstring
  14. I prefer functional programming over object-oriented ones. However, object-oriented programming can coexist with functional programming.
  15. Create classes only when it is necessary (Typically to persist the contexts/states, to avoid sticky codes, or to establish natural relationships). 
  16. Classes should be designed in a way so that their functionalities can be at least partially utilized from any outside function if they can specify the required context variables. One way of achieving this is to keep all the major algorithms as static methods with declared interfaces of the inputs and then keeping some methods that will glue these algorithms by utilizing the context or instantiated information.
  17. Deep learning, in general, is a streamed optimization process. So I prefer to write deep learning code as a stream of data that is processed through a sequence of generators.
  18. Any mathematical/code-ninja trick should accompany a short, no-more-than-one-line comment explaining what is intended. This improves readability a lot.
  19. Try to use PEP-8 as long as it is not in direct opposition to the above points.

Leave a Reply

Your email address will not be published. Required fields are marked *