Michael Bassili

Test Driven Game Development with Godot

Sat, 26 Dec 2020 00:00:00 +0000

Writing clean and maintainable code often follows well-defined practices for structuring and testing software. In today’s example, we’re going to go cover testing and test-driven development, specifically in Godot. While the tools mentioned may be unique to Godot, the principles here can be applied to any non-trivial software project. After some deliberating, we’ve drilled down our development and testing pipeline to the following.

Modularizing components for ease of testing and refactoring
Write tests for each complete scene to maximize test coverage
Understandable logging and error reporting for interactions between components

Clear separations of scenes in Godot is key to ensuring your project can be unit tested. There are also numerous other benefits to keeping your project well-organized, modular, and atomic. But for now, let’s stay within the scope of testing. When scenes are kept atomic, they can be built within a smaller scope. A scope that is, hopefully, easier to test against when compared to the project as a whole. Atomic components can be tested with the knowledge that they maintain a set of valid inputs and outputs. It’s tricky (but not impossible) to test a component without pre-defined I/O but keeping scenes modular can help reduce the overall number of “tricky” tests needed. Modular scenes can have their components tested individually without having to worry about secondary interactions between other components. Our tests could furthermore serve as documentation; expected states in tests can serve as a sort of API interface for your components. You should also write documentation, of course, but having working examples of test cases on a given scene can facilitate future development. The crux here is that if your one component takes a set of inputs X and returns a set of outputs Y, then your confidence of the entire system increases. If a scene passes its unit tests, then you can run with the assumption that it will work when other components are supplying the inputs.

Every scene needs tests. This serves two purposes: it forces developers to truly understand the results and consequences of their scene, and it gets developers in the habit of writing tests as they develop. Too often, new scenes are built without tests. These scenes are added to larger scenes and cause problems. Once a scene is being used by another, it becomes much harder to nail down where things went wrong. Godot’s scene architecture makes this easier some other engines out there, but pernicious problems can still cause amok. Testing your scenes atomically before they’re added to a parent scene helps ensure predictability in your larger applications. It reduces the number of trivial issues in your scenes while exposing non-trivial issues during interactions. Signals are especially annoying; sending a signal from one scene and handling it elsewhere often requires a developer hooking up the signal to a function in your scene. These interactions are fairly loose (if you consider nested scenes to be “tight,” for example) and as a result, they should be tested thoroughly. Ensure that signals are being handled properly before hooking them up to your levels.

Finally, consider logging all important actions. Add a DEBUG flag and flip it when building from your development environment. When in DEBUG mode, log everything to /var/log/GAMENAME.log or some other file. Printing works well enough but as a game grows in complexity, you may find it helpful to refer to previous logs. If you print your log messages to console, they may be overwritten the next time your project is built or executed. Logging helps developers and QA debug their backend quietly. A helpful rule we try to follow is that every interaction should be logged. Period. This mimics DB logging; every database interaction is usually logged (maybe using ARIES) so that upon a crash, it can have its transactions redone or undone. Similarly, anytime your project interacts with another component, log it. When something goes wrong between two scenes, your logs can help identify which components caused the issue and why. While game development isn’t as complex as say, kernel development, a lot of the same concessions and tips made building kernels are helpful for non-trivial game development. Single-report debugging is impressive and all, but tracing complex interactions manually after-the-fact is a lot more realistic. Spend time ensuring logging is present to assure yourselves that problems you missed in unit testing can be caught. If you’re properly unit testing, the issues you’ll be debugging will be interaction-based problems. What goes wrong when two scenes talk to one another. Use logging in tandem with unit testing to round out your QA pipeline.

The following tools and frameworks can help you structure and write excellent tests:

GUT Godot Unit Test, unit testing tool for the Godot engine
WAT Automated Testing, automated testing plugin for the Godot engine

We did not go over specifics in this article because the GUT and WAT docs are quite extensive. Please refer to them for specific examples and quick-start guides.

Installing Custom Proton Builds

Sat, 05 Sep 2020 00:00:00 +0000

If you’re like me, you play a lot of games. Proton has been a godsend for me. It pretty much made it possible for me to transition entirely over to Linux from Windows. Some games still don’t work with Proton, but talented folks are constantly developing new builds. This doc will walk you through installing custom builds.

You can either make your own custom Proton builds by forking the official Proton repository, or you can find open source ones on GitHub, like this one by GloriousEggroll. These custom Proton builds may fix game-specific issues, or they may just include optimizations that have yet to be released.

Installation For Native Steam

Download the tarball of your custom Proton build onto your system
Run sudo mkdir ~/.steam/root/compatibilitytools.d to create a directory where your custom Proton builds will reside
Extract the contents of the tarball into ~/.steam/root/compatibilitytools.d/
Restart your Steam client
Right-click on a game and choose “Properties”
Under “Force the use of a specific Steam Play compatibility tool` and find the name of the custom build

Installation For Flatpak Steam

Download the tarball of your custom Proton build onto your system
Run sudo mkdir ~/.var/app/com.valvesoftware.Steam/data/Steam/compatibilitytools.d/ to create a directory where your custom Proton builds will reside
Extract the contents of the tarball into ~/.var/app/com.valvesoftware.Steam/data/Steam/compatibilitytools.d/
Restart your Steam client
Right-click on a game and choose “Properties”
Under “Force the use of a specific Steam Play compatibility tool` and find the name of the custom build

Now your game will run using the custom Proton build you installed. You can add any custom Steam Play compatibility tools into the compatibilitytools.d directory and Steam will identify it and allow you to use it on your games. I hope this brief article helps someone out!

Basic Python Design Patterns

Sat, 08 Feb 2020 00:00:00 +0000

I found these design patterns to be the most useful in my professional life. I hope this reference helps someone with their technical interview, personal project, etc. These patterns can be found in any non-trivial software system (not just in Python), so it pays to be familiar with them.

Singletons

Possibly the simplest design pattern, Singletons are quite versatile objects. You’ll often find situations where refusing to use a singleton object makes things much harder.

Define Singleton Object

A singleton object is an object that returns a single instance of itself whenever called. A helpful analogy would be to imagine two people calling the police. Both people are independent, and they’re both calling about different things. But, regardless of who calls 911 (or your regional equivalent), you’ll always be put in contact with the same police precinct. Both people access the same instance of the police despite calling from different phones, being different people, and being located in different places within a city. Whenever a singleton object is called, the single instance of the singleton object is returned. New singleton instances aren’t returned when someone calls its initializer; there is only ever a single singleton instance, shared across the entire program. Quite predictably, this is fairly straightforward to implement. We simply need to override the __init__ function of our singleton class and set it to return a single instance, rather than creating a whole new instance. This will ensure that those requesting a new singleton object are returned the shared instance instead of a whole new object.

class Singleton:
    """Definition of a Singleton object."""
    
    singleton_instance = None
    
    def __init__(self):
        """
        Override the initialization 
        mechanism, returning only the single instance.
        """
        ...
        
    @staticmethod
    def get_singleton():
        """
        Method for fetching the Singleton instance.
        Is static so that it can be accessed everywhere.
        """
        ...
        
    @staticmethod
    def update_singleton(val):
        """
        Method for setting value of Singleton instance.
        Is static so that it can be accessed everywhere.
        """
        ...

The data stored in the singleton instance is arbitrary. What’s important is that regardless of data, caller, and scope, the singleton object returns the same instance. This makes singletons useful when implementing things like global settings or run configurations.

Singleton Example

Use the code snippet below to play with an active singleton implementation. Try replacing the variable singleton_instance with a data structure (e.g. a dictionary) and see how the implementation of the getter and setter change. Try writing some functions that share the singleton instance.

class Singleton:
  """Definition of a Singleton object."""

  # Maintain state of Singleton
  singleton_instance = None

  def __init__(self):
    """Override the initialization mechanism."""
    if Singleton.singleton_instance is None:
      Singleton.singleton_instance = self
    
  @staticmethod
  def get_singleton():
    """
    Method for fetching the Singleton instance.
    Is static so that it can be accessed everywhere.
    """
    if Singleton.singleton_instance is None:
      Singleton()  # Call __init__ to initialize instance
    return Singleton.singleton_instance

  @staticmethod
  def update_singleton(val):
    """
    Method for setting value of Singleton instance.
    Is static so that it can be accessed everywhere.
    """
    if Singleton.singleton_instance is None:
      Singleton()  # Call __init__ to initialize instance
    Singleton.singleton_instance = val

Singleton.update_singleton("Michael")
print("Value in Singleton instance is: " + Singleton.get_singleton())
Singleton()  # Try to create a new Singleton instance
print("Value in Singleton instance is STILL: " + Singleton.get_singleton())

Decorators

Decorators are useful when you need to introduce additional functionality to an existing object without changing its structure.

Define Decorators

Large projects may rely on the existing structure of object implementations to function. But, let’s say you are tasked with adding additional functionality to these objects without altering the rest of the project. In other words: how do you change something without breaking it? One possible solution is to use decorators. The decorator pattern is essentially the act of tethering additional functionalities to existing objects. The process involves creating a decorator class that will wrap around the original object’s class implementation, allowing additional functionality to be added without so much as touching the original object’s implementation. In terms of the execution process, the decorator will be accessed first, followed by the original function. A helpful way to think of it is to imagine that the decorator a middle-man. When you ask it to call your original object, it intercepts it, does some additional stuff to it, and then calls your original object. When the original object concludes its execution, it’s returned to the decorator that wraps things up before terminating itself.

class decorator(object):
  """Defines a decorator for the original function."""
 
  def __init__(self, func):
    """Overrides the initializer, calls function."""
    ...
    
  def __call__(self):
    """Overrides the class call functionality."""
    ...
 
@decorator
def original_function():
  """Original function that performs an action."""
  ...

Decorators prove useful when trying to implement things like benchmarking functionality or timekeeping. Any additional functionality one wishes to attach to an existing object can be implemented cleanly using decorators.

Decorator Example

Use the code snippet below to play with a decorator implementation (in this case, wrapping a function). Try creating your own decorator. What happens when you add multiple decorators to an object? Can you think of other ways to implement the same kind of mechanism?

class decorator(object):
  """Defines a decorator for the original function."""

  def __init__(self, func):
    """Overrides the initializer, calls function."""
    print("Initializing the decorator...")
    func()  # Calls the original object
    
  def __call__(self):
    """Overrides the class call functionality."""
    print("Inside the decorator's call function...")

@decorator
def original_function():
  """Original function that prints a name."""
  print("My name is Michael!")
	
original_function()

Facades

While potentially limiting, the facade pattern allows complex systems and subsystems to be abstracted for high-level use. Facades allow for unified, non-granular handling of unnecesarily complexities.

Define Facades

Let’s say you’re working with another developer’s API or product and you wish to reduce their complexity down to something more manageable and high-level for your personal use. That’s where facades come in handy. By wrapping subsystems in a facade, you can simplify your interactions with them in order to streamline your own development process. The easiest way to imagine facades is to think of a car. Turning on a car doesn’t just “turn it on.” The engine is enabled and fed some fuel, various fluids and inner mechanisms turn on, the lights flash, etc. But, all these subsystems are abstracted by the act of turning the key. In a similar vein, a developer can wrap API calls, error handling, and other smaller subsystems into one unified facade.

class Facade:
    """Abstracts multiple subsystems for ease-of-use."""
 
    def __init__(self):
        """Initializes all subsystems."""
        ...
 
class SubsystemOne:
    """Defines all details of the first subsystem."""
    ...
 
class SubsystemTwo:
    """Defines all details of the second subsystem."""
    ...

When managing a team of developers, it might be useful to obscure complex and unnecessary subsystems in order to streamline their development pipeline. Facades are simply Python objects themselves, so these same developers can grow to modify them later on.

Facade Example

Play with the code snippet below to get an intuitive understanding of how facades play into a program. Can you add a new subsystem into the initializer? Can you override some other dunders in the facade to expand the implementation?

class Facade:
  """Facade implementation, wrapping other subsystems."""

  def __init__(self):
    """Initlizes all other subsystems."""
    SubsystemOne()  # Init the first system
    SubsystemTwo()  # Init the second system

class SubsystemOne:
  """First subsystem to abstract."""

  def __init__(self):
    """Initializes the first subsystem."""
    print("Initializing the first subsystem...")

class SubsystemTwo:
  """Second subsystem to abstract."""
  
  def __init__(self):
    """Initializes the second subsystem."""
    print("Initializing the second subsystem...")

Facade()

Prototypes

Prototype objects are general object implementations that can operate regardless of their type and value. They can be used to create various, similarly-structured objects using different types of data.

Define Prototypes

The backbone of prototypes is an object that’s prepared to accept and initialize according to any given data type. Essentially, we’re creating a template of an object. This object can be used to build a collection of similarly-built objects, all with their own types and uses. This becomes useful when you’re building multiple similar entities in your program. For example, you may be building a video game with Python. You can prototype a general Item object that can be used to create various different interactable items in the game. There will always be situations where prototyping objects will be useful, so it’s a helpful pattern to get the hang of. The beauty of prototypes is that the Python subsystem doesn’t concern itself with the type of the object. It simply accepts any assignment of data type and treks forward. Knowing this, we can create multiple identical prototyped objects, all with different types, and Python won’t complain.

class Prototype:
    """Prototype object class."""
 
    def __init__(self):
        """Initializes object, general type and value."""
        ...
 
    def __call__(self):
        """Calls the object, does some general interaction."""
        ...
 
    def get_type(self):
        """Gets the type of the prototype object."""
        ...
 
class Object(Prototype):
    """Object inherits the prototype class directly."""
    ...

If you document your prototype class well, the developer using it might not ever know how many structures and functions lie behind the scenes. Providing a good API for other developers is critical when working in a team, and prototyping is good practice for building intuitive APIs.

Prototype Example

Use the code snippet below to play with an implementation of a prototype. Can you think of some other use cases for prototypes? How would you use this in a banking system, for example? Can you modify it to print its type whenever we call type() manually? (Hint: think of overriding the __class__ dunder in the prototype object.

class Prototype:
    """General prototype object for overall use."""

    object_type = None
    object_value = None

    def __repr__(self):
        """The string representation of the prototype object."""
        print("Inside the __repr__ function body...")
        return str(self.object_type)  # Simply return the __repr__ of the type

class MyObject(Prototype):
    """An object based off a prototype."""

    def __init__(self, obj_type, obj_value):
        """Initializes based off the prototype."""
        self.object_type = obj_type
        self.object_value = obj_value
    
my_object = MyObject(str, "Michael")
print("The type of the object (__repr__) is: " + str(my_object))  # __repr__
print("The type of the object (direct call) is: " + str(my_object.object_type))
print("The value of the object is: " + my_object.object_value)

Builders

Similar to prototyping, builders are essentially templates which rely on the caller to provide references to whatever builders (other subsystems) they want to be used in the creation of their object. This comes in handy when an object is so general and open-ended that it can be constructed from a variety of different subsystems.

Define Builders

The backbone of builders is that they’re classes that contain empty functions. These empty functions are assigned an implementation based on the caller. What this amounts to is one class containing function signatures and (potentially) several other classes and functions containing implementations which have matching signatures to their builder definition. A helpful analogy would be to imagine your builder class as a literal construction worker. You instruct the builder to build you a house. Your neighbour also hires the same builder to built their house. Both neighbours want a house, but they want it built differently. The first neighbour might tell the builder to use wood planks, while the second neighbour might want to use concrete. The builder can build a house using any material, so both neighbours will receive a valid house in the end. Similarly, a builder may be called by two different callers, both passing in different implementations for the builder’s subsystems. Both callers will get what they want regardless.

class Manager:
    """Oversees all implementations in the builder."""
 
class Builder:
    """Generic builder implementation."""
    def my_subsystem(self): pass
    ...
 
class CustomBuilder:
    """More specific builder for a specific task."""
    
    def my_subsystem(self):
        """Implementation of subsystem."""
        ...

The builder relies on the caller for its subsystems’ implementations, so the output of the builder is coupled with the input of the caller. The construction worker can’t just build a house out of whatever they want; it’s up to whoever hired them to define how they want their house built.

Builder Example

Use the code snippet below to play with a builder implementation. Can you write some new implementations for the subsystems in the builder? What situations might this pattern prove useful?

class Manager:
  """Oversees all builder tasks."""
  builder_implementation = None

  def assign_builder(self, builder):
    """Assigns a builder to the manager."""
    self.builder_implementation = builder

  def get_contruction_output(self):
    """Prints the output of the construction."""
    self.builder_implementation.get_construction_material()
    self.builder_implementation.get_construction_tool()

class Builder:
  """General builder object."""
  def get_construction_material(self): pass
  def get_construction_tool(self): pass

class HouseBuilder(Builder):
  """More specific builder of houses."""
  
  def get_construction_material(self):
    """Prints the construction material for the house."""
    print("Building the house using wood...")

  def get_construction_tool(self):
    """Prints the construction tool for the house."""
    print("Building the house using a hammer...")

class ApartmentBuilder(Builder):
  """More specific builder of apartments."""
  
  def get_construction_material(self):
    """Prints the construction material for the apartment."""
    print("Building the apartment using steel...")

  def get_construction_tool(self):
    """Prints the construction tool for the apartment."""
    print("Building the apartment using a screwdriver...")

# Builder for houses
house_foreman = Manager()  
house_foreman.assign_builder(HouseBuilder())
house_foreman.get_contruction_output()

# Builder for apartments
house_foreman = Manager()  
house_foreman.assign_builder(ApartmentBuilder())
house_foreman.get_contruction_output()

Simulating Dynamic Website Interactions With GitHub Actions

Thu, 28 Nov 2019 00:00:00 +0000

As a “starving-artist,” I don’t really like spending money unnecessarily. So, when it came time to create a (now defunct) website to host my weekly ramblings, I scoffed when faced with potential hosting costs. That’s when I remembered that this is 2019, and there are an abundance of tech out there built with people like me in mind! (Not really, but I benefited from their existence anyways, so whatever.)

The problem was as follows: “I want a dynamic website, but I don’t want to pay for hosting. I want some dynamic elements, but I’m willing to compromise for the sake of frugality. My content will be written work published (roughly) once a week, and it will be categorized by various terms and dates. It should use HTTPS, for obvious reasons, and it shouldn’t be hard to update. If I want to push some changes to the backend, I shouldn’t have to struggle too much.

GitHub Actions configuration diagram outlining the event and runner flows

There are seemingly limitless options out there for dynamic websites, but I had my eyes on a single, static-only site technology: Jekyll. The Jekyll backend has been my personal website’s go-to tech for years now, and I’ve been nothing but satisfied. But, Jekyll is a static site generator, and I crave dynamism in my newly-brainstormed side project! I really really like Jekyll, but I refuse to sacrifice the benefits and possibilities offered up by having a dynamic website. That’s when it hit me!

What if I could just rebuild the static website every so often? That would emulate the kind of dynamics I desire. GitHub already hosts static websites for free via GitHub Pages, so all I’d need to do is schedule some sort of cron job every so often. The site would be rebuilt, and I could implement some pretend-dynamism into the site. Maybe in the form of a “Daily Random Post” button or something, inspired by Wikipedia’s Random Article button/link. That would bring my hosting costs down to literally nothing. But, I’d need a CI environment that supports cron jobs, and I still refuse to pay for any sort of third-party hosting.

So, it turns our that the TravisCI service has a free plan, but only for open source projects. I’m not selling a product, nor do I care whether people take and modify my video game blog, so I had no quarrels with going open source. That opened me up to free CIs and the world of cron jobs! Funny enough, GitHub Pages actually allows CIs to push to a GitHub Pages website super easily, so I went ahead and deployed the static site through a CI.

<!-- Escape chars needed so that Jekyll doesn't try to execute this code snippet -->
{\% assign random = site.time | date: "%S%N" | modulo: site.categories.articles.size %\}
<div><a href=""><b>Daily Recommended Post</b></a></div>

The final step was the fake-dynamism. Let’s say I choose a random Jekyll post. If I were deploying the site statically, that post would be the same until I pushed a change to production and regenerated the entire site. With cron jobs, however, I can regenerate the site every day! So, instead of having a “Random Post” button, I can add a “Random Daily Post”” button! Close enough to my dreams without me having to spend a single penny on CI environments or hosting!

The easiest part of this whole debacle was securing the site with HTTPS. Cloudflare has a free tier which allows you to access its security and analytics features. Simply update some DNS records and you’re done! The site’s traffic is routed through Cloudflare, secured, and served back to the user. Easy, free HTTPS. The little lock icon in your address bar (both on this site and Mike’s Gaming Trove’s site) is free, and thanks to Cloudflare.

Cloudflare infrastructure diagram showing user-server interactions

Cloudflare also allows for things like minifying JS and CSS, and global analytic and visitor tracking, so it’s also useful for user engagement polling. For a free service, Cloudflare really exceeded my expectations.

GitHub Pages supports custom domains right out of the box. It was as simple as adding a CNAME file to the root directory. In terms of money spent, a CAD$20 domain was fine, as it’s a yearly cost, and it’s not very large. Monthly hosting charges were my main concern, so 10/12 = CAD$1.67 a month is more than satisfactory.

In the end, the now-defunct site was built automatically once per day and pushed to GitHub Pages. I pay for the domain, but that’s it. The site it totally open source. There are some dynamic elements that I’m able to use now that the site is pseudo-dynamic. Unlike this very site, Mike’s Gaming Trove is not static and still; there’s some element of daily dynamism in there. So, while both this site and MGT are built with Jekyll, MGT is much more flexible in terms of content delivery than this site. (That’s fine too, since this site sees far fewer updates than MGT, which tries to post weekly.) In summary:

Built site with Jekyll, hosted built static site for free using GitHub pages
Use TravisCI to build and deploy the static Jekyll site to GitHub Pages
Set up a daily cron job that rebuild the entire site, allowing for vars that can change every day
Secure the site through Cloudflare’s free tier of protection, access to free analytics
Obtain a domain and point it to the built GitHub Pages site

In all, I spend CAD$1.67 a month for a domain, and CAD$0 for hosting and continuous integration. A steal considering what’s happening behind the scenes. A CI builds my site every day and deploys it to a service that serves it to users, through a middleman that secures the session and tracks all sorts of analytics. While this isn’t my most technically impressive achievement, the fact that I can make pseudo-dynamic websites with Jekyll makes me really happy. Hopefully somebody will be inspired by this to further iterate on the idea.

Evaluating Python Set Creation Performance

Mon, 15 Jul 2019 00:00:00 +0000

While porting my team’s code to Python 3, I came across an interesting PyCharm warning about using set literals instead of the set() function. The code snippet involved converting a list to a set using set(), and PyCharm was not pleased. Upon investigation, I found that set syntax is faster due to lower overhead in Python. Benchmarking confirmed a noticeable performance increase with set literals. I delve into the intricacies of function calls, explaining the additional steps involved.

PyCharm Prefers Set Syntax To A Func Calls

While porting my team’s code over to Python 3, I encountered a PyCharm warning I found interesting: “Function call can be replaced with set literal.” The snippet of code looked a little something like this… We were converting an existing collection into a set using the set() function.

list_to_convert_to_set = [1, 2, 3, 4, 5]  # Imagine this existed somewhere in memory
my_set = set(list_to_convert_to_set)      # PyCharm complained when I did this

So after some digging, I learned a little more about the way Python deals with function calls vs. how they deal with set syntax, in terms of performance. In summary: using set syntax is faster because Python has less overhead involved compared to when it uses a function. That’s why PyCharm recommends set syntax over the set() function. So let’s dive in!

Benchmarking Function Calls Vs. Literals

Try the following code in your terminal. It should print two lines, one for each function call and assignment. The first uses set() to convert an existing collection into a set, and the second uses set syntax (i.e. {*}) to perform the conversion. The * character unpacks the collection. As you can see, there is a non-zero performance increase for the literal.

from timeit import timeit
print(timeit("my_set = set([1, 2, 3, 4, 5, 6, 7, 8, 9, 0])"))
print(timeit("my_set = {*[1, 2, 3, 4, 5, 6, 7, 8, 9, 0]}"))

0.3326582070000086
0.28732166300000017 

You’ll probably see a negligible difference, but there is a difference! That’s because Python is doing a lot more work for the function call then it is when it uses set syntax. Mainly, Python is storing the namespace and stack as it executes the function.

What Python Is Actually Doing

So what is Python doing? Why are funciton calls taking longer? Imagine you’re a Python interpreter that is told to execute a function. You can’t just drop all that you’re doing to execute the newly-provided function! You need to put away whatever you’re doing first before doing something else. Otherwise, you won’t be able to resume your work when you’re done interpreting the function.

Here are a few things that Python needs to do with functions:

It needs to look up the function in memory so that it can use it
It needs to initialize and manage a frame for the function
It needs to store all local variables
It needs to store activation records (i.e. call frames) that are kept on the runtime stack
It needs to push the frame to the top of the stack when it’s finally called

That’s a lot of stuff Python needs to do in order to handle your set() call! In contrast, here’s what Python needs to do when using set syntax:

It needs to load the constant (i.e. constant folding) for the operator

Python can do this pretty quickly, resulting in faster generation of sets when using set syntax than when using the set() function, which itself has to build the set from a generator. The result? Faster set conversion with the set syntax than with the function (for most cases I tested). In reality, you might be able to achieve equivalent performance using set() when your data is already an iterable or a generator object.

Summary Information

Now that we’ve gotten to the bottom of this, we can change that function call to set syntax in our project, thus shutting-up PyCharm! I hope this helps someone who’s wondering why the heck their IDE is angry at them for using a convenient, built-in Python function. The solution: use an even-more convenient Python literal!

FAQ The * Character

The * character unpacks a collection so that it can be converted into a set using set syntax. Try the following in your terminal… A TypeError will be raised when you try to convert my_list to a set. Python complains with the following: TypeError: unhashable type: 'list' To resolve, this, we need to unpack the collection with the * character.

try: 
  my_list = [1, 2, 3, 4, 5]
  my_set = {my_list}
except TypeError:
  print("See! I told you so!")

Now with the * character unpacking our collection, we get a proper set object!

print(type({*[1, 2, 3, 4, 5]}))

<class 'set'>

When you do {[1, 2, 3]}, Python will try to make a set with one element. The * character uses list expansion to construct the set from the containing members, not the list object. Thanks to all the people who emailed me to share their knowledge! You all helped make this blog post more accurate and informative.

Smart Shuffling RATM's 'Evil Empire' With Python

Tue, 24 Apr 2018 00:00:00 +0000

The shuffle feature in my car for CDs is inconvenient as it needs to be re-toggled after each power cycle, often glitches with specific CDs like Evil Empire, and sometimes plays the next track instead of shuffling properly. This frustration contrasts with Spotify’s reliable shuffle function, which may be due to the large size of my playlists.

Why Even Bother Doing This?

The car I drive has a shuffle feature for CDs. It must be re-toggled whenever the car power cycles, and routinely bugs out whenever I try to flip though my Evil Empire disk. Sometimes, the next “shuffled” track is simply the following track, defeating the purpose of shuffling a CD! This frustrates me to no end. When it happens, it derails my whole week. In contrast, Spotify’s shuffle has never let me down (this might be because of the sheer size of my Spotify playlists—roughly one thousand tracks per playlist), but I suspect that Spotify, iTunes, and various other shuffle-supporting platforms have figured out the ideal way to shuffle a playlist. For clarity, I’ll be referring to any list-based collection of music as a playlist (e.g. CDs, records, etc.). Let’s start from scratch…

Shuffle v0: Preparing our Data

We need to create a data structure that can hold tracks as well as meta data. The easy solution would be to create a playlist data structure and populate every element in said data structure with its respective track’s information. But I’m lazy, so we won’t be doing that. Instead, we’ll be defining a dictionary of elements (tracks ), and we’ll populate each track with random bits to pretend they’re filled with music data.

evil_empire = {
    'People of the Sun': {}, 
    'Bulls on Parade': {}, 
    'Vietnow': {}, 
    'Revolver': {},
    'Snakecharmer': {}, 
    'Tire Me': {}, 
    'Down Rodeo': {}, 
    'Without a Face': {},
    'Wind Bellow': {}, 
    'Roll Right': {}, 
    'Year of tha Boomerang': {},
}

for track in evil_empire:
    evil_empire[track] = {'Song': [getrandbits(1000000)]}

evil_empire['Meta Data'] = {
    'Contributing Artists': [
    'Zach de la Rocha', 
    'Tom Morello', 
    'Tim Commerford', 
    'Brad Wilk',
    ],
    'Genres': ['Rap Metal', 'Nu Metal'],
    'Length': '11',
    'Current Position': '1'
}

This is obviously gross to look at and not scalable, but the goal of this adventure is to build the best shuffle algorithm. I’m not convinced that implanting a simple data structure to help us is a productive use of my free time. Also, this is my blog, so…

Shuffle v1: Literally Just Playing the Playlist

Take some playlist p and treat it like a list of dictionaries. Each dictionary contains a track’s data, including its title, contributing artists, and the song itself, encoded in some format. The dumbest way to “shuffle” a “playlist” would be to simply play the damn thing in-order. When the next track is requested, the user will be disappointed. Yeah it doesn’t really do what we want it to do yet, but I figure if Ford can’t get this right, then it wouldn’t hurt to start from square one.

def shuffle_v1(playlist):
    current_position = int(playlist['Meta Data']['Current Position'])
    playlist_length = int(playlist['Meta Data']['Length'])
    return (current_position + 1) % playlist_length

This initial implementation has many major faults, the primary being that it doesn’t shuffle the playlist. Tough. Well, we can do better…

Shuffle v1.1: Randomizing the Indices

Count and store the length of the playlist. Then, with your pseudorandom number generator of choice, generate an integer between zero and the length of your playlist. Great! So now when the next track is requested, a random track in the playlist will play. If the index of the current track is chosen, reshuffle. Now we’re at a thousand-dollar implantation of the shuffle feature since this is what my car does. Obviously, this is crap, and to understand why, we need to put ourselves in a user’s shoes.

def shuffle_v1_1(playlist):
    current_position = int(playlist['Meta Data']['Current Position'])
    playlist_length = int(playlist['Meta Data']['Length'])
    shuffle_position = current_position
    while shuffle_position == current_position:
        shuffle_position = randint(0, playlist_length - 1)
    return shuffle_position

A user doesn’t really want a perfectly random shuffle of their playlist, but rather a “random” song somewhere distant in their playlist. A good way to convince yourself of this principle is to imagine popping a new CD into your CD player. The first of twenty tracks play. You request the next track using the CD player. Now imagine: (1) the third song plays, or (2) the tenth song plays. The closer the randomly selected song is to your current position, the less likely the user is to feel that this permutation of tracks was properly shuffled. Our solution will be as follows…

Shuffle v2: User Experience Tweaking

We count and store the length of the playlist. We randomly select an integer between zero and the length of the playlist. If the integer is within some tolerance—or distance, we reshuffle. For example, starting from track 1, an index of 3 might be too close. We reshuffle, and we get an index of 8. Better. We can introduce probabilities of reshuffling based on the distance between the two indices. Rolling a 3 when we’re currently at 1 yields a 50% chance of reshuffling; rolling a 2 when we’re currently at 1 yields a 100% chance of reshuffling. We can halve this probability every increment. We can add base cases for smaller playlists (<= 5 tracks). If we roll the same index as our current track, we’d re-shuffle with 100% probability.

def shuffle_v2(playlist):
    current_position = int(playlist['Meta Data']['Current Position'])
    playlist_length = int(playlist['Meta Data']['Length'])
    shuffle_position = current_position
    while shuffle_position == current_position:
        shuffle_position = randint(0, playlist_length - 1)
        delta_position = abs(shuffle_position - current_position)
        if playlist_length > 2 and delta_position == 1 and random() < 0.5:
            shuffle_position = randint(0, playlist_length - 1)
        elif playlist_length > 2 and delta_position == 2 and random() < 0.25:
            shuffle_position = randint(0, playlist_length - 1)
    return shuffle_position

This achieves two things: tracks begin to feel random (which is what we’re going for—it’s all about the user experience), and “random” quirks are still included (rolling a 3 when we’re on 1). But there’s another issue…

Shuffle v2.1: The Algorithm I’ve Been Looking For

We got to go fast! We can’t do all this shuffling when the track is requested, because it would create disproportionate transition times between tracks, among other obvious problems. We remedy this by creating a permutation of tracks whenever the playlist is loaded up. Count the number of tracks, and shuffle once for every track, abiding by the rules set in v2, until we have a permutation of “random” tracks. The larger the playlist, the less likely we are to re-shuffle, and an average album is 15 tracks long.

def shuffle_v2_1(playlist):
    track_permutation = []
    current_position = int(playlist['Meta Data']['Current Position'])
    playlist_length = int(playlist['Meta Data']['Length'])
    if playlist_length in [0, 1]:
        return [0]  # An empty, or small playlist
    while len(track_permutation) < playlist_length:
        shuffle_position = randint(0, playlist_length - 1)
        delta_position = abs(shuffle_position - current_position)
        if shuffle_position not in track_permutation:
            if playlist_length > 2 and delta_position == 1 and random() < 0.5:
                shuffle_position = randint(0, playlist_length - 1)
            elif playlist_length > 2 and delta_position == 2 and random() < 0.25:
                shuffle_position = randint(0, playlist_length - 1)
            track_permutation.append(shuffle_position)
    return track_permutation

So, we’ll say that the average number of times we’d reshuffle is the number of indices we’d reshuffle times their probabilities. If we say that the neighboring 1st, 2nd, and 3rd index reshuffle with a probability of 100%, 50%, and 25%, respectively, we’d reshuffle about (1/15)(1) + (1/14)(0.5) + (1/13)(0.25) = 0.12 = 12% of the time. We’ll call that our benchmark. Let’s not do worse than a 12% re-shuffle rate.

Shuffle v2.2: Final Implementation

We take v2.1 and generalize the probability of reshuffling for n tracks in playlist p. We say that the probability of reshuffling is 1 / delta_position, giving us a scalable probably set for all tracks in the playlist (instead of just the two neighboring indices).

def shuffle_v2_2(playlist):
    track_permutation = []
    current_position = int(playlist['Meta Data']['Current Position'])
    playlist_length = int(playlist['Meta Data']['Length'])
    if playlist_length in [0, 1]:
        return [0]  # An empty, or small playlist
    while len(track_permutation) < playlist_length:
        shuffle_position = randint(0, playlist_length - 1)
        delta_position = abs(shuffle_position - current_position)
        if shuffle_position not in track_permutation:
            if playlist_length > 2 and random() < float(1 / delta_position) :
                shuffle_position = randint(0, playlist_length - 1)
            track_permutation.append(shuffle_position)

    return track_permutation

An added bonus is that, by default, the probability of reshuffling goes to zero as delta_position goes to n. All together, we get what I consider to be a rich, user-oriented implementation of the shuffle feature on music players. Take note, Ford.

Performance of Our Playlist Permutations

Let’s try to generalize the performance of this shuffle algorithm. As the length of the playlist goes to infinity, the probability of reshuffling goes to zero. The average performance is O(n), where n is the length of the playlist since it must go through all the indices of the playlist to order them.

For small playlists, performance is actually worse than O(n), as the probability of reshuffling is high. There are only two real playlist sizes to concern ourselves with: EPs (1-3 tracks) and Albums. For EPs, what will probably happen most of the time is that the first track will shuffle to the third track, finishing off with the second track. This “waterfalling” behavior can actually be observed with playlists of any length, since the probability of reshuffling gets smaller the farther away from the initial track you are.

Appendix I: Testing the Implementations

We should probably test these! For the simpler methods, we’ll test using the first and last index of the playlist. For the more complex and feature-rich methods, we’ll test them recursively, passing in the previous shuffle’s index as the new current index. We expect the shuffles to “feel” random.

2018-04-24 16:18:35 : Testing Shuffle v1...
2018-04-24 16:18:35 :   Starting from track 1, shuffled to track 2, a difference of 1 track(s).
2018-04-24 16:18:35 :   Starting from track 5, shuffled to track 6, a difference of 1 track(s).
2018-04-24 16:18:35 :   Starting from track 10, shuffled to track 0, a difference of 10 track(s).
2018-04-24 16:18:35 : Ending test of Shuffle v1...
2018-04-24 16:18:35 : Testing Shuffle v1.1...
2018-04-24 16:18:35 :   Starting from track 10, shuffled to track 1, a difference of 9 track(s).
2018-04-24 16:18:35 :   Starting from track 5, shuffled to track 4, a difference of 1 track(s).
2018-04-24 16:18:35 :   Starting from track 10, shuffled to track 6, a difference of 4 track(s).
2018-04-24 16:18:35 : Ending test of Shuffle v1.1...
2018-04-24 16:18:35 : Testing Shuffle v2...
2018-04-24 16:18:35 :   Starting from track 10, shuffled to track 4, a difference of 6 track(s).
2018-04-24 16:18:35 :   Starting from track 5, shuffled to track 7, a difference of 2 track(s).
2018-04-24 16:18:35 :   Starting from track 10, shuffled to track 7, a difference of 3 track(s).
2018-04-24 16:18:35 : Ending test of Shuffle v2...
2018-04-24 16:18:35 : Testing Shuffle v2.1...
2018-04-24 16:18:35 :   Shuffle permutation is [7, 6, 1, 3, 9, 4, 5, 10, 0, 8, 2]
2018-04-24 16:18:35 :   Shuffle permutation is [6, 9, 1, 7, 5, 8, 4, 6, 2, 0, 10]
2018-04-24 16:18:35 :   Shuffle permutation is [2, 1, 4, 6, 5, 0, 10, 7, 3, 9, 8]
2018-04-24 16:18:35 : Ending test of Shuffle v2.1...

Appendix II: Personal Preference

I am fully aware that some people don’t care about this as much as I do, but I listen to a lot of music, and it frustrates me when I shuffle to the next track only to have the neighboring track play. If I wanted the next track in the album, I wouldn’t have enabled the damn shuffle feature, Ford. Moreover, it’s obvious at this point that I wrote this is a hate-fueled rage. I don’t want to listen to Vietnow, I want something else! On the plus side, I now have sample code to refer to when I come around to building my shuffle-oriented Spotify competitor!