
Shots conceded by the best defenses in the Premier League
The Premier League has started in August and already 7 match weeks of the 2024/25 have been played. Liverpool is currently top of the table with 18 points, with Manchester City and Arsenal tied at 17 points. Chelsea and Aston Villa are currently tied in the fourth place with 14 points.
Liverpool is currently the best defense in the League with only 2 goals conceded, while teams like Aston Villa have been a bit more leaky, conceding 9 goals in the league so far and keeping very few clean sheets.
In this article I want to analyze the top 10 teams for goals conceded and visualize what type of chances they are conceding, to get an intuition about their defensive style of play. I am going to outline the analysis process, from getting the data, to aggregating, visualizing and ranking the teams.
Getting the data
I have used the data available on football data website. Here you can find the detailed match statistics of all major European leagues (and more). In order to import one league, using pandas
you can do:
df_epl = pd.read_csv("https://www.football-data.co.uk/mmz4281/2425/E0.csv")
At this point you can have a look at the data inside the file, and it will look something like this
Date | HomeTeam | AwayTeam | FTHG | FTAG | … | AC |
---|---|---|---|---|---|---|
16/08/2024 | Man United | Fulham | 1 | 0 | … | 8 |
17/08/2024 | Ipswich | Liverpool | 0 | 2 | … | 10 |
17/08/2024 | Arsenal | Wolves | 2 | 0 | … | 2 |
We have quite a big number of interesting statistics here. Apart from the full time goals, we can access shots, shots on target, corners and even yellow and red cards.
The next step will be to select the top 10 teams by goals conceded, calculate the number of shots and shots on target conceded so far and create a visualization that can give us some intuition on the defensive style of these teams.
Calculating goals and shots conceded
In order to calculate the number of goals and shots conceded by each team, I will first define a function that, given a team name in input, will calculate the number of goals and shots conceded by the team in all matches played so far. Once I have done that, I can apply that function to all teams in a loop and select only the ones that have conceded 0 goals.
The function to calculate the number of goals conceded by a single team looks like this
def calc_goals_conc(team):
team_matches = df_epl[(df_epl['HomeTeam']==team) | (df_epl['AwayTeam']==team)]
goals_conceded = 0
shots_conceded = 0
shots_ot_conceded = 0
for idx, row in team_matches.iterrows():
if row['HomeTeam'] == team:
shots_ot_conceded += row['AST']
shots_conceded += row['AS']
goals_conceded += row['FTAG']
else:
shots_ot_conceded += row['HST']
shots_conceded += row['HS']
goals_conceded += row['FTHG']
return goals_conceded, shots_ot_conceded, shots_conceded
This function takes in input the name of a team team_name
and outputs
goals_conceded
: the number of goals conceded by the team so farshots_ot_conceded
: the number of shots on target conceded by the team so farshots_conceded
: the number of shots on target conceded by the team so far
The calculation follows these steps:
- We slice the DataFrame selecting all rows where the name of the home or away team was the one given as input to the function. We call
team_matches
this smaller DataFrame, since it contains only matches played by the team we are analyzing. - We initialize
goals_conceded
,shots_ot_conceded
andshots_conceded
to 0. - We loop on the
team_matches
DataFrame and, in case the team is playing at home, we add toshots_ot_conceded
the value of the columnAST
. If the team is playing away, we add toshots_ot_conceded
the value of the columnHST
. - In the same loop, in case the team is playing at home, we add to
shots_conceded
the value of the columnAS
. If the team is playing away, we add toshots_conceded
the value of the columnHS
. - In the same loop, in case the team is playing at home, we add to
goals_conceded
the value of the columnFTAG
. If the team is playing away, we add togoals_conceded
the value of the columnFTHG
. - We return the
goals_conceded
,shots_ot_conceded
andshots_conceded
.
This function is ready to use. For example, if we want to calculate the numbers for Man United we will simply call
calc_goals_conc('Man United')
# output
(8, 29, 85)
The output means that Man United conceded 8 goals this season, with 85 shots conceded, of which 29 on target.
Let’s now calculate the above for all the teams. To do this we simply write
all_teams = set(list(df_epl['HomeTeam']) + list(df_epl['AwayTeam']))
# loop on all teams and save the info about goals/shots conceded
shot_conc_allteams = []
for team in all_teams:
goal_vs, shot_ot_vs, shot_vs = calc_goals_conc(team)
shot_conc_allteams.append((team, goal_vs, shot_ot_vs, shot_vs))
We first get the names of all teams by creating a set
made of the all the names in the HomeTeam
and AwayTeam
columns. In this way we are sure to collect all teams that have played home or away this season. We transform this into a set
to eliminate the duplicates.
The second step, is to create a list shot_conc_allteams
that will contain the results of the calc_goals_conc
function, one entry for each team. Finally, we loop on the all_teams
list and invoke the calc_goals_conc
function, giving it as an input the team name.
Finally, it’s time to select only the top 10 defenses. We will consider those teams that have conceded the fewer number of goals. To calculate that we sort our list shot_conc_allteams
, and select only the top 10 elements.
shot_conc_allteams.sort(key=lambda x: (x[1], x[3]), reverse=False)
shot_conc_topteams = shot_conc_allteams[:10]
Here we use a lambda function that sorts our list by the 2nd element (the goals conceded), and in case the value is the same, it looks at the 4th element (the total shots conceded).
We are now ready to visualize these data.
Visualizing the data
The shot_conc_topteams
can be visualized as a Markdown table by simply printing out all the information in the list, and formatting it as a table.
shot_conc_topteams = shot_conc_allteams[:10]
print("| Team | Goals Conceded | Shots Conceded | Shots on target conceded |")
print("| ---- | -------------- | -------------- | ------------------------ |")
for team in shot_conc_topteams:
print(f"| {team[0]} | {team[1]} | {team[3]} | {team[2]} |")
Which gives the following table
Team | Goals Conceded | Shots Conceded | Shots on target conceded |
---|---|---|---|
Liverpool | 2 | 64 | 24 |
Nott’m Forest | 6 | 93 | 26 |
Arsenal | 6 | 103 | 31 |
Newcastle | 7 | 113 | 38 |
Man City | 8 | 56 | 22 |
Tottenham | 8 | 61 | 23 |
Man United | 8 | 85 | 29 |
Fulham | 8 | 92 | 28 |
Chelsea | 8 | 97 | 40 |
Aston Villa | 9 | 73 | 25 |
We can now visualize these data. I chose to make a 2d scatter plot with the number of shots conceded on one axis, and the number of shots on target conceded on the other axis.
We can see some clusters in the plot. For example, City, Spurs, Liverpool and Villa are at the bottom left of the graph, since they concede very few shots. They keep the ball for most of the game and tend not to give up many chances. On the opposite end, Newcastle and Chelsea are the ones that concede the most shots and the most shots on target respectively. Chelsea is interesting, because, despite conceding fewer shots than Arsenal, they concede many more shots on target. In fact, more than any other team in the top 10 defenses.
We can also calculate the ratio of shots on target conceded with respect to the total shots conceded. The average for the top 10 teams is 34%, and it turns out that the best team according to this metric is Nottingham Forest, with only 28% of the shots conceded being on target, while the worst is Chelsea with 41%.
If you are interested in this type of analysis, I have written a few books where I go into the details of how to get the data, visualize and train a model to predict football results for the Premier League, La Liga, Serie A, Bundesliga and the other major European national tournaments, complete with code examples.