21. May 2021

If you have read my previous Python posts "dict() vs {}" and "Performance of all() and any()", you can already imagine that I am interested in performance issues. Working with large amounts of data - like a natural language corpus - performance improvements are always something to think about. At the end of last year, I made again an attempt to search for a good book which provides suggestions how to improve Python code. I wanted to read a book specific for Python to find the Python specific improvements. After some search, I came across "Fast Python - master the basics to write faster code" by Chris Conlan and bought it as a Christmas present for myself. Long story short: I bought it, I read it and now I want to tell you why I liked it!

 

Let me start with some short facts: The book has 6 chapters across 150 pages. The chapter titles are:

1. Introduction

2. Adding Things

3. Counting Things

4. Sorting Things

5. Declaring Things

6. Miscellaneous Topics

 

The introduction contains information about how he wrote his books and how to read it, but also some introduction into computational complexity. For me, this was a nice way to refresh my knowledge before deep dive into the other chapters. What first was a little bit "weird" was the first example. I was like "Why would you ever write it like this? No wonder you can speed it up." Well, some sentence later I found out it was fully on purpose and he has a horrible worst case in each example. Which is kind of cool, because if you have not that much experience you can even see the small improvements you can make to your code - and what the impact can be.

 

As someone who writes Python code since more than 10 years, it was clear from the beginning that the chances of learning something new is not that big. Even so, learning that I did everything I could to make the code as fast as possible - without parallelization - is a very good feeling. Now you think I didn’t learn much and why do I like it, or why should you read it? Well, there were three small nuggets which I try to use in the future:

 

  • collections.Counter
    I already know about this class and that it extends the dict class. I used it a few times, but sparse. Why should I use it if I just can work with a normal dict? Apparently, collections.Counter is optimized below the Python level and therefore outperforms the usage of dict
    For me, this means in the future I will use it whenever I can.
     
  • String manipulation
    In chapter 5, Chris Conlan takes also a look at the complexity and memory management of adding strings together. There are two ways to do it: using the += operator or using the .join() method for lists. In my case, I usually use the first one when I don’t have a list already, and the second if I do have a list of words. Now, after reading the book, I know that I will try in the future to always using .join() instead of +=. Based on Chris Conlan’s analysis it is clear that .join() is the better solution with a much lower memory overhead and is faster.

 

  • List manipulation
    In the same chapter, lists are discussed after the string. After what he wrote - and I told you above - you would think += is equal or slower than .extend(). But that is the wrong assumption! For whatever reason += and .extend() are not the same under the Python hood, and += is optimized at some level for lists. Now our only issue is to remember which one is faster for which data type ;)


If you are working with big data and want to speed up your code: I recommend this book to you, hopefully some of the examples provides you with ideas how to improve your code.

If you are interested in the topic of code optimization: I recommend this book because it contains great explanations and examples. I’m pretty sure you could learn something new.

comment

Formatting Tips

  • bold text: [b]bold text[/b]
  • italic text: [i]italic text[/i]
  • underline text: [u]underline text[/u]
  • image: [img]http://...[/img]
  • link: [url]http://...[/url]
  • link with text: [url=http://...]link with text[/url]
  • code: [code=<language>]your code[/code]