I had a challenge not long ago where I had a dataframe of users and a list of different security groups to which each belonged. I wanted to do some simple analysis on how many groups were represented in the dataframe and how many users belonged to each group. A simple horizontal bar chart would suffice.

To provide a more real life example of my problem and solution, imagine you wanted to do some analysis on the three main UEFA titles–Champions League, Europa League, and UEFA Super Cup–and wanted to know how many English teams won each. You might first start by collecting the title winners for each of the contests into a single dataframe. Following that approach, we now have a dataframe that looks like this:

Our dataframe with “Club” as a string and “title” as a list of strings

Start with a unique set of titles

Since I want my chart to show each UEFA title, let’s get a list of those titles like so:

unique_title_list = list(set([item for sublist in df_combo.title.tolist() for item in sublist if len(item)>0]))

This code performs several operations in a single line:

  1. It converts the title column into a list. Since each value is already a list, the result is a list of lists.
  2. Next, I use some clever list comprehension to iterate into each sublist and then interate into each item in that sublist. The result is one large list of all titles won. Note that I also add a “length greater than 0” test just to make sure I avoid empty strings.
  3. Next, I use Python’s set function to produce a group of just the unique titles.
  4. Finally, I cast the set back to a list.

Count the teams that have won each title

To get the count of teams winning each title, I iterate across my unique list, filter down the dataframe by each title, and count the results:

title_counts = {}

for u in unique_title_list:
	winner_count = df_combo[df_combo.title.apply(lambda t: u in t)].shape[0]
	title_counts[u] = winner_count

Nicely sort your results

A good looking bar chart usually sorts the bars low-to-high or high-to-low, so I take this additional step to sort my results:

title, c = [], []
for k,v in sorted(title_counts.items(), key=lambda x: x[1]):
	title.append(k)
	c.append(v)

Finally, chart the results

Last, I wrote this code to produce a horizontal bar charts showing a count of the English teams winning a UEFA title:

fig, ax = plt.subplots(figsize=(10,6))
_ = ax.barh(title, c)
_ = ax.set_xlabel('Number of English Teams')
_ = ax.set_ylabel('Title')
_ = ax.set_title('Number of English football teams winning UEFA titles')

So, this chart is a little lackluster, but what an accomplishment to have five different English teams winning these titles!