Shots conceded by the best defenses in the Premier League

Shots conceded by the best defenses in the Premier League

The Premier League has started in August and already 7 match weeks of the 2024/25 have been played. Liverpool is currently top of the table with 18 points, with Manchester City and Arsenal tied at 17 points. Chelsea and Aston Villa are currently tied in the fourth place with 14 points.

Liverpool is currently the best defense in the League with only 2 goals conceded, while teams like Aston Villa have been a bit more leaky, conceding 9 goals in the league so far and keeping very few clean sheets.

In this article I want to analyze the top 10 teams for goals conceded and visualize what type of chances they are conceding, to get an intuition about their defensive style of play. I am going to outline the analysis process, from getting the data, to aggregating, visualizing and ranking the teams.

Getting the data

I have used the data available on football data website. Here you can find the detailed match statistics of all major European leagues (and more). In order to import one league, using pandas you can do:

df_epl = pd.read_csv("https://www.football-data.co.uk/mmz4281/2425/E0.csv")

At this point you can have a look at the data inside the file, and it will look something like this

Date HomeTeam AwayTeam FTHG FTAG AC
16/08/2024 Man United Fulham 1 0 8
17/08/2024 Ipswich Liverpool 0 2 10
17/08/2024 Arsenal Wolves 2 0 2

We have quite a big number of interesting statistics here. Apart from the full time goals, we can access shots, shots on target, corners and even yellow and red cards.

The next step will be to select the top 10 teams by goals conceded, calculate the number of shots and shots on target conceded so far and create a visualization that can give us some intuition on the defensive style of these teams.

Calculating goals and shots conceded

In order to calculate the number of goals and shots conceded by each team, I will first define a function that, given a team name in input, will calculate the number of goals and shots conceded by the team in all matches played so far. Once I have done that, I can apply that function to all teams in a loop and select only the ones that have conceded 0 goals.

The function to calculate the number of goals conceded by a single team looks like this

def calc_goals_conc(team):

    team_matches = df_epl[(df_epl['HomeTeam']==team) | (df_epl['AwayTeam']==team)]

    goals_conceded = 0
    shots_conceded = 0
    shots_ot_conceded = 0

    for idx, row in team_matches.iterrows():
        if row['HomeTeam'] == team:
            shots_ot_conceded += row['AST']
            shots_conceded += row['AS']
            goals_conceded += row['FTAG']
        else:
            shots_ot_conceded += row['HST']
            shots_conceded += row['HS']
            goals_conceded += row['FTHG']

    return goals_conceded, shots_ot_conceded, shots_conceded

This function takes in input the name of a team team_name and outputs

  • goals_conceded: the number of goals conceded by the team so far
  • shots_ot_conceded: the number of shots on target conceded by the team so far
  • shots_conceded: the number of shots on target conceded by the team so far

The calculation follows these steps:

  1. We slice the DataFrame selecting all rows where the name of the home or away team was the one given as input to the function. We call team_matches this smaller DataFrame, since it contains only matches played by the team we are analyzing.
  2. We initialize goals_conceded, shots_ot_conceded and shots_conceded to 0.
  3. We loop on the team_matches DataFrame and, in case the team is playing at home, we add to shots_ot_conceded the value of the column AST. If the team is playing away, we add to shots_ot_conceded the value of the column HST.
  4. In the same loop, in case the team is playing at home, we add to shots_conceded the value of the column AS. If the team is playing away, we add to shots_conceded the value of the column HS.
  5. In the same loop, in case the team is playing at home, we add to goals_conceded the value of the column FTAG. If the team is playing away, we add to goals_conceded the value of the column FTHG.
  6. We return the goals_conceded, shots_ot_conceded and shots_conceded.

This function is ready to use. For example, if we want to calculate the numbers for Man United we will simply call

calc_goals_conc('Man United')
# output
(8, 29, 85)

The output means that Man United conceded 8 goals this season, with 85 shots conceded, of which 29 on target.

Let’s now calculate the above for all the teams. To do this we simply write

all_teams = set(list(df_epl['HomeTeam']) + list(df_epl['AwayTeam']))

# loop on all teams and save the info about goals/shots conceded
shot_conc_allteams = []
for team in all_teams:
    goal_vs, shot_ot_vs, shot_vs = calc_goals_conc(team)
    shot_conc_allteams.append((team, goal_vs, shot_ot_vs, shot_vs))

We first get the names of all teams by creating a set made of the all the names in the HomeTeam and AwayTeam columns. In this way we are sure to collect all teams that have played home or away this season. We transform this into a set to eliminate the duplicates.

The second step, is to create a list shot_conc_allteams that will contain the results of the calc_goals_conc function, one entry for each team. Finally, we loop on the all_teams list and invoke the calc_goals_conc function, giving it as an input the team name.

Finally, it’s time to select only the top 10 defenses. We will consider those teams that have conceded the fewer number of goals. To calculate that we sort our list shot_conc_allteams, and select only the top 10 elements.

shot_conc_allteams.sort(key=lambda x: (x[1], x[3]), reverse=False)
shot_conc_topteams = shot_conc_allteams[:10]

Here we use a lambda function that sorts our list by the 2nd element (the goals conceded), and in case the value is the same, it looks at the 4th element (the total shots conceded).

We are now ready to visualize these data.

Visualizing the data

The shot_conc_topteams can be visualized as a Markdown table by simply printing out all the information in the list, and formatting it as a table.

shot_conc_topteams = shot_conc_allteams[:10]

print("| Team | Goals Conceded | Shots Conceded | Shots on target conceded |")
print("| ---- | -------------- | -------------- | ------------------------ |")
for team in shot_conc_topteams:
    print(f"| {team[0]} | {team[1]} | {team[3]} | {team[2]} |")

Which gives the following table

Team Goals Conceded Shots Conceded Shots on target conceded
Liverpool 2 64 24
Nott’m Forest 6 93 26
Arsenal 6 103 31
Newcastle 7 113 38
Man City 8 56 22
Tottenham 8 61 23
Man United 8 85 29
Fulham 8 92 28
Chelsea 8 97 40
Aston Villa 9 73 25

We can now visualize these data. I chose to make a 2d scatter plot with the number of shots conceded on one axis, and the number of shots on target conceded on the other axis.

Shots conceded vs shots on target conceded (top 10 Premier League defenses)

We can see some clusters in the plot. For example, City, Spurs, Liverpool and Villa are at the bottom left of the graph, since they concede very few shots. They keep the ball for most of the game and tend not to give up many chances. On the opposite end, Newcastle and Chelsea are the ones that concede the most shots and the most shots on target respectively. Chelsea is interesting, because, despite conceding fewer shots than Arsenal, they concede many more shots on target. In fact, more than any other team in the top 10 defenses.

We can also calculate the ratio of shots on target conceded with respect to the total shots conceded. The average for the top 10 teams is 34%, and it turns out that the best team according to this metric is Nottingham Forest, with only 28% of the shots conceded being on target, while the worst is Chelsea with 41%.

If you are interested in this type of analysis, I have written a few books where I go into the details of how to get the data, visualize and train a model to predict football results for the Premier League, La Liga, Serie A, Bundesliga and the other major European national tournaments, complete with code examples.

Check out the books on

Antonio
Antonio Author of Code a Soccer Betting model in a Weekend and Soccer Betting Coding
comments powered by Disqus