Python Performance: Why 'if not list' is 2x Faster Than Using len()

https://lemmy.world/post/28121911

Python Performance: Why 'if not list' is 2x Faster Than Using len() - Lemmy.World

Lemmy

I write a lot of Python. I hate it when people use “X is more pythonic” as some kind of argument for what is a better solution to a problem. I also have a hang up with people acting like python has any form of type safety, instead of just embracing duck typing.This lands us at the following:

The article states that “you can check a list for emptiness in two ways: if not mylist or if len(mylist) == 0”. Already here, a fundamental mistake has been made: You don’t know (and shouldn’t care) whether mylist is a list. These two checks are not different ways of doing the same thing, but two different checks altogether. The first checks whether the object is “falsey” and the second checks whether the object has a well defined length that is zero. These are two completely different checks, which often (but far from always) overlap. Embrace the duck type- type safe python is a myth.

isn’t the expected behaviour exactly identical on any object that has len defined:

“By default, an object is considered true unless its class defines either a bool() method that returns False or a len() method that returns zero, when called with the object.”

It’s not the same, and you kinda answered your own question with that quote. Consider what happens when an object defines both dunder bool and dunder len. It’s possible for dunder len to return 0 while dunder bool returns True, in which case the falsy-ness of the instance would not depend at all on the value of len

Exactly as you said yourself: Checking falsieness does not guarantee that the object has a length. There is considerable overlap between the two, and if it turns out that this check is a performance bottleneck (which I have a hard time imagining) it can be appropriate to check for falsieness instead of zero length. But in that case, don’t be surprised if you suddenly get an obscure bug because of some custom object not behaving the way you assumed it would.

I guess my primary point is that we should be checking for what we actually care about, because that makes intent clear and reduces the chance for obscure bugs.

type safe python is a myth

Sure, but type hints provide a ton of value in documenting for your users what the code expects. I use type hints everywhere, and it’s fantastic! Yes, there’s no guarantee that the types are correct, but with static analysis and the assumption that your users want their code to work correctly, there’s a very high chance that the types are correct.

That said, I lie about types all the time. For example, if my function accepts a class instance as an argument, the intention is that the code accept any class that implements the same methods as the one I’ve defined in the parameter list, and you don’t necessarily have to pass an instance of that class in (or one of its sub-classes). But I feel like putting something reasonable in there makes a lot more sense than nothing, and I can clarify in the docstring that I really just need something that looks like that object. One of these days I’ll get around to switching that to Protocol classes to reduce type errors.

That said, I don’t type hint everything. A lot of private methods and private functions don’t have types, because they’re usually short and aren’t used outside the class/file anyway, so what’s the point?

Type hints are usually great, as long as they’re kept up to date and the IDE interprets them correctly. Recently I’ve had some problems with PyCharm acting up and insisting that matplotlib doesn’t accept numpy arrays, leading me to just disable the type checker altogether.

All in all, I’m a bit divided on type hints, because I’m unsure whether I think the (huge) value added from correct type hints outweighs the frustration I’ve experienced from incorrect type hints. Per now I’m leaning towards “type hints are good, as long as you never blindly trust them and only treat them as a coarse indicator of what some dev thought at some point.”

leading me to just disable the type checker altogether.

The better option is to just put # type: ignore on the statements where it gets confused, and add hints for your code. I’ve done that for SQLAlchemy before they got proper type hinting, and it worked pretty well.

That said, a type hint is just that, a hint. It shouldn’t be relied on to be 100% accurate (i.e. lots of foo: list should actually be foo: list | None), but if you use a decent static analysis tool, you should catch the worst of it. We use pyright, which is built in to the VSCode extension pylance. It works incredibly well, though it’s a bit too strict in many cases (e.g. when things can be None but generally aren’t).

So yeah, never blindly trust type hints, but do use them everywhere. The more hints you have, the more the static analysis can help, and disabling them on a case-by-case basis is incredibly easy. You’ll probably still get some runtime exceptions that correct type checking could have caught, but it’s a lot better than having a bunch of verbose checks everywhere that make no sense. A good companion to type checks is robust unit test cases with reasonable data (i.e. try to exercise the boundaries of what users can input).

As it stands, we very rarely get runtime exceptions due to poor typing because our type hints are generally pretty good and our unit test cases back that up. Don’t blindly trust it, and absolutely read the docs for anything you plan to use, but as long as you are pretty consistent, you can start making some assumptions about what your data looks like.

I really do agree on all your points, so at the end of the day I think a lot comes down to use-case and personal preference.

My primary use cases for Python are prototyping and as a frontend/scripting tool for software written in C/C++/Fortran. I’ve written/worked on only one larger code base in pure Python, and my personal opinion became that I heavily prefer strictly typed languages once the code base exceeds a certain size. It just feels so much smoother to work with when I have actual guarantees that are enforced by the language.

With that said, we were a bunch of people that are used to using Python for prototyping that developed this larger library, and it would probably have gone a lot better if we actually enforced use of proper type hinting from the start (which we were not used to).

I heavily prefer strictly typed languages once the code base exceeds a certain size

As do I, but we don’t all get to pick our stack.

I use Rust for all my personal projects unless I have a good reason to pick something else. I like pretty much everything about it, from the lack of classes (I hate massive class hierarchies) to the borrow checker to everything being an expression. It feels like I’m getting most of the benefits of functional programming, without being tied down to FP to solve problems.

That said, I think Python is a reasonable choice for large codebases. For simple scripts, I generally don’t bother with type hints. At my current company, our largest codebase is well over 100k lines of Python, so the type hints are absolutely welcome since they help document code I haven’t touched in over a year (if ever). If things get slow, there’s always the option of a native module. But for most things, Python is fast enough, so it’s no big deal. Because of this, I use type hints for anything that might become a larger project. After the initial POC, I’ll go through and update types, fix a bunch of linting warnings/errors, and flesh out the unit tests. That way I have something to build from when I inevitably come back to it in a year or so.

So yeah, I definitely recommend using type hinting. The best time to add type hints is at the start of development, the next best time is now.

The next best time is now

If my Easter break gets boring I might just start cleaning up that Python library… It’s the prime example of something that developed from a POC to a fully functional code base, was left largely unused for about a year, and just the past weeks has suddenly seen a lot of use again. Luckily we’re strict about good docstrings, but type hints would have been nice too.

Woo, do it! And add some tests while you’re at it in case those don’t exist.

I found a few bugs just going through and cleaning up missing code coverage. Maybe you’ll find the same!

How does Python know of it’s my list or not?

if isinstance(mylist, list) and not mylist

Problem solved.

Or if not mylist # check if list is empty

You’re checking if mylist is falsey. Sometimes that’s the same as checking if it’s empty, if it’s actually a list, but that’s not guaranteed.
Doesn’t Python treat all empty iterables as false tho? This isn’t unique to python, is it? (though I’m not a programmer…just a dude who writes scripts every now and then)
My point is that the second statement you presented can have the effect of evaluating emptiness of a Sequence (note: distinct from an Iterable), but that only holds true if the target of the conditional IS a sequence. I’m underlining the semantic difference that was elided as a result of falsey evaluation.
Ok, help a noob out. What is the difference between a sequence and an iterable? Is a sequence immutable, like a tuple?

An iterable is just something that can be iterated over, like range(10), or [1, 2, 3].

A sequence on the other hand is a Collection that is reversible.

docs.python.org/3/library/collections.abc.html#co…

collections.abc — Abstract Base Classes for Containers

Source code: Lib/_collections_abc.py This module provides abstract base classes that can be used to test whether a class provides a particular interface; for example, whether it is hashable or whet...

Python documentation
I know what an iterable is. But I am talking about Type[Iterable], which iirc does not obey falsey eval when empty.

thing: Sequence[Any] iirc is iterable, indexable, and reversible.

thing: Iterable[Any] only guarantees that its iterable - and note that iterating can sometimes have the effect of consuming the iterable (e.g. when working with streaming interfaces)

Not really, generators have weird truthiness, i don’t remember if they evaluate to true or false, but they cannot be checked for emptiness so they default to either always true or always false.
I think you missed the joke 😅
I thought it was funny!
Python likes giving lists.
else: # not my list, it is ourlist
Yea and then you use “not” with a variable name that does not make it obvious that it is a list and another person who reads the code thinks it is a bool. Hell a couple of months later you yourself wont even understand that it is a list. You should not sacrifice code readability for over optimization, this is phyton after all I don’t think list lengths will be your bottle neck.
Strongly disagree that not x implies to programmers that x is a bool.
well it does not imply directly per se since you can “not” many things but I feel like my first assumption would be it is used in a bool context
You can make that assumption at your own peril.
I don’t think they are a minority
If anything len tells you that it is a sequence or a collection, “not” does not tell you that. That I feel like is the main point of my objection.
I would say it depends heavily on the language. In Python, it’s very common that different objects have some kind of Boolean interpretation, so assuming that an object is a bool because it is used in a Boolean context is a bit silly.
if not x then … end is very common in Lua for similar purposes, very rarely do you see hard nil comparisons or calls to typeof (last time I did was for a serializer).

Well fair enough but I still like the fact that len makes the aim and the object more transparent on a quick look through the code which is what I am trying to get at. The supporting argument on bools wasn’t’t very to the point I agree.

That being said is there an application of “not” on other classes which cannot be replaced by some other more transparent operator (I confess I only know the bool and length context)? I would rather have transparently named operators rather than having to remember what “not” does on ten different types. I like duck typing as much as the next guy, but when it is so opaque as in the case of not, I prefer alternatives. For instance having open or read on different objects which does really read or open some data vs not some object god knows what it does I should memorise each case.

Truthiness is so fundamental, in most languages, all values have a truthiness, whether they are bool or not. Even in C, int x = value(); if (!x) x_is_not_zero(); is valid and idiomatic.

I appreciate the point that calling a method gives more context cues and potentially aids readability, but in this case I feel like not is the python idiom people expect and reads just fine.

I don’t know, it throws me off but perhaps because I always use len in this context. Is there any generally applicable practical reason why one would prefer “not” over len? Is it just compactness and being pythonic?
I definitely agree that len is the preferred choice for checking the emptiness of an object, for the reasons you mention. I’m just pointing out that assuming a variable is a bool because it’s used in a Boolean context is a bit silly, especially in Python or other languages where any object can have a truthiness value, and where this is commonly utilised.
It is not “assume” as in a conscious “this is probably a bool I will assume so” but more like a slip of attention by someone who is more used to the bool context of not. Is “not integer” or “not list” really that commonly used that it is even comparable to its usage in bool context?

Then I absolutely understand you :)

How common it is 100 % depends on the code base and what practices are preferred. In Python code bases where I have a word in decisions, all Boolean checks should be x is True or x is False if x should be a Boolean. In that sense, if I read if x or if not x, it’s an indicator that x doesn’t need to be a Boolean.

It does if you are used to sane languages instead of the implicit conversion nonsense C and the “dynamic” languages are doing
i haven’t programmed since college 15 years ago and even i know that 0 == false.

Doesn’t matter what it implies. The entire purpose of programming is to make it so a human doesn’t have to go do something manually.

not x tells me I need to go manually check what type x is in Python.

len(x) == 0 tells me that it’s being type-checked automatically

That’s just not true:

  • not x - has an empty value (None, False, [], {}, etc)
  • len(x) == 0 - has a length (list, dict, tuple, etc, or even a custom type implementing __len__)

You can probably assume it’s iterable, but that’s about it.

But why assume? You can easily just document the type with a type-hint:

def do_work(foo: list | None): if not foo: return ...

Maybe, but that serves as a very valuable teaching opportunity about the concept of “empty” is in Python. It’s pretty intuitive IMO, and it can make a lot of things more clear once you understand that.

That said, larger projects should be using type hints everywhere, and that should make the intention here painfully obvious:

def do_work(foo: list | None): if not foo: ... handle empty list ... ...

That’s obviously not a boolean, but it’s being treated as one. If the meaning there isn’t obvious, then look it up/ask someone about Python semantics.

I’m generally not a fan of learning a ton of jargon/big frameworks to get the benefits of more productivity (e.g. many design patterns are a bit obtuse IMO), but learning language semantics that are used pretty much everywhere seems pretty reasonable to me. And it’s a lot nicer than doing something like this everywhere:

if foo is None or len(foo) == 0:

In context, one can consider it a bool.

Besides, I see c code all the time that treats pointers as bool for the purposes of an if statement. !pointer is very common and no one thinks that means pointer it’s exclusively a Boolean concept.

if you’re worried about readability you can leave a comment.
If there is an alternative through which I can achieve the same intended effect and is a bit more safer (because it will verify that it has len implemented) I would prefer that to commenting. Also if I have to comment every len use of not that sounds quite redundant as len checks are very common

There is no guarantee that the comment is kept up to date with the code. “Self documenting code” is a meme, but clearly written code is pretty much always preferable to unclear code with a comment, largely because you can actually be sure that the code does what it says it does.

Note: You still need to comment your code kids.

Comments shouldn’t explain code. Code should explain code by being readable.

Comments are for whys. Why is the code doing the things it’s doing. Why is the code doing this strange thing here. Why does a thing need to be in this order. Why do I need to store this value here.

Stuff like that.

Better yet, a type hint. foo: list | None can be checked by static analysis, # foo is a list isn’t.
In my experience, if you didn’t write the function that creates the list, there’s a solid chance it could be None too, and if you try to check the length of None, you get an error. This is also why returning None when a function fails is bad practice IMO, but that doesn’t seem to stop my coworkers.
good point I try to initialize None collections to empty collections in the beginning but not guaranteed and len would catch it

Sometimes there’s an important difference between None and []. That’s by far not the most common use, but it does exist (e.g. None could mean “user didn’t supply any data” and [] could mean “user explicitly supplied empty data”).

If the distinction matters, make it explicit:

if foo is None: raise ValueError("foo must be defined for this operation") if not foo: return None for bar in foo: ... return some_other_value

This way you’re explicit about what constitutes an error vs no data, and the caller can differentiate as well. In most cases though, you don’t need that first check, if not foo can probably just return None or use some default value or whatever, and whether it’s None or [] doesn’t matter.

if len(foo) == 0: is bad for a few reasons:

  • TypeError will be raised if it’s None, which is probably unexpected
  • it’s slower
  • it’s longer

If you don’t care about the distinction, handle both the same way. If you do care, handle them separately.

Passing None to a function expecting a list is the error…

That’s why we use type-hinting at my company:

def do_work(foo: list | None): if not foo: return ...

Boom, self-documenting, faster, and very simple.

len(foo) == 0 also doesn’t imply it’s a list, it could be a dict or any other type that implements the __len__. That matters a lot in most cases, so I highly recommend using type hints instead of relying on assumptions like len(foo) == 0 is probably a list operation.

Well, in your case it is not clear whether you intended to branch in the variable foo being None, or on the list being empty which is semantically very different…

Thats why it’s better to explicitelly express whether you want an empty collection (len = 0) or a None value.

Well yeah, because I’m explicitly not defining a difference between None and []. In most cases, the difference doesn’t matter.

If I did want to differentiate, I’d use another if block:

if foo is None: ... if not foo: ...

Explicit is better than implicit. I hate relying on exceptions like len(foo) == 0 raising a TypeError because that’s very much not explicit.

Exceptions should be for exceptional cases, as in, things that aren’t expected. If it is expected, make an explicit check for it.

I don’t really understand the point about exceptions. Yeah “not foo” cannot throw an exception. But the program should crash if an invalid input is provided. If the function expects an optional[list] it should be provided with either a list or None, nothing else.

Sure. But is None invalid input in your case, whereas [] is valid? If so, make that check explicit, don’t rely on an implicit check that len(…) does.

When I see TypeError in the logs, I assume the developer screwed up. When I see ValueError in the logs, I assume the user screwed up. Ideally, TypeError should never happen, and every case where it could happen should transform it to another type of exception that indicates where the error actually lies.

The only exceptions I want to see in my code are:

  • exceptions from libraries, such as databases and whatnot, when I do something invalid
  • explicitly raised exceptions

Implicit ones like accessing attributes on None or calling methods that don’t exist shouldn’t be happening in production code.