Shots conceded by the best defenses in the top European leagues

Shots conceded by the best defenses in the top European leagues

With the start of Bundesliga, all European leagues have finally kicked off the 2024/25 season. This means that we can now take a look at statistics for the first matches and draw some comparison between leagues and teams. It’s still too early to make final conclusions, but it can be definitely interesting to look at early trends.

In particular, I have noticed that quite a few teams have kept a clean sheet in all matches played so far. How good are their defenses? Is it because they are not conceding shots on target, or because their goalkeepers are world-class? To answer these questions, we need to

  • select the teams that have not conceded a single goal (yet)
  • calculate the number of shots on target conceded
  • rank them by number of shots on target conceded

In this way, the teams that have outstanding goalkeepers will be ranked at the top, since they have saved the highest amount of shots on target.

In the article, I am going to outline the analysis process, from getting the data, to aggregating, visualizing and ranking the teams.

Getting the data

I have used the data available on football data website. Here you can find the detailed match statistics of all major European leagues (and more). In order to import one league, using pandas you can do:

df_epl = pd.read_csv("https://www.football-data.co.uk/mmz4281/2425/E0.csv")

Ar this point you can have a look at the data inside the file, and it will look something like this

Date HomeTeam AwayTeam FTHG FTAG AC
16/08/2024 Man United Fulham 1 0 8
17/08/2024 Ipswich Liverpool 0 2 10
17/08/2024 Arsenal Wolves 2 0 2

We have quite a big number of interesting statistics here. Apart from the full time goals, we can access shots, shots on target, corners and even yellow and red cards.

The next step is to aggregate the data from all European leagues. In order to do this, we will simply import them in a separate DataFrame first, and later concatenate those together.

df_epl = pd.read_csv("https://www.football-data.co.uk/mmz4281/2425/E0.csv")
df_bundes = pd.read_csv("https://www.football-data.co.uk/mmz4281/2425/D1.csv")
df_liga = pd.read_csv("https://www.football-data.co.uk/mmz4281/2425/SP1.csv")
df_seriea = pd.read_csv("https://www.football-data.co.uk/mmz4281/2425/I1.csv")
df_ligue1 = pd.read_csv("https://www.football-data.co.uk/mmz4281/2425/F1.csv")
df_ere = pd.read_csv("https://www.football-data.co.uk/mmz4281/2425/N1.csv")
df_pri = pd.read_csv("https://www.football-data.co.uk/mmz4281/2425/P1.csv")

Here we have imported the data from the top 7 European leagues (Premier League, Bundesliga, La Liga, Serie A, Ligue 1, Eredivisie and Primeira Liga). So now we can concatenate those into a single big one.

all_leagues = pd.concat([df_epl, df_liga, df_seriea, df_ligue1, df_ere, df_pri, df_bundes])

The all_leagues DataFrame will contain all matches played so far in the 7 leagues mentioned above.

We are now ready to calculate the total number of goals conceded by each team so far, and select only the teams that have not conceded a goal yet.

Calculating goals conceded

In order to calculate the number of goals conceded by each team, I will first define a function that, given a team name in input, will calculate the number of goals conceded by the team in all matches played so far. Once I have done that, I can apply that function to all teams in a loop and select only the ones that have conceded 0 goals.

The function to calculate the number of goals conceded by a single team looks like this

def calc_shotstop(team):

    team_matches = all_leagues[(all_leagues['HomeTeam']==team) | (all_leagues['AwayTeam']==team)]
    shots_ot_conceded = 0
    goals_conceded = 0
    for idx, row in team_matches.iterrows():
        if row['HomeTeam'] == team:
            shots_ot_conceded += row['AST']
            goals_conceded += row['FTAG']
        else:
            shots_ot_conceded += row['HST']
            goals_conceded += row['FTHG']

    if shots_ot_conceded>0:
        shot_stop_ratio = 1 - float(goals_conceded)/shots_ot_conceded
    else:
        shot_stop_ratio = 0

    div = row['Div']
    return shot_stop_ratio, goals_conceded, shots_ot_conceded, div

This function takes in input the name of a team team_name and outputs

  • goals_conceded: the number of goals conceded by the team so far
  • shots_ot_conceded: the number of shots on target conceded by the team so far
  • shot_stop_ratio: the number of shots on target stopped by the goalkeeper
  • div: the division the team plays in

The calculation follows these steps:

  1. We slice the DataFrame selecting all rows where the name of the home or away team was the one given as input to the function. We call team_matches this smaller DataFrame, since it contains only matches played by the team we are analyzing.
  2. We initialize goals_conceded and shots_ot_conceded to 0.
  3. We loop on the team_matches DataFrame and, in case the team is playing at home, we add to shots_ot_conceded the value of the column AST. If the team is playing away, we add to shots_ot_conceded the value of the column HST.
  4. In the same loop, in case the team is playing at home, we add to goals_conceded the value of the column FTAG. If the team is playing away, we add to goals_conceded the value of the column FTHG.
  5. We calculate the shot_stop_ratio as 1 minus the ratio between goals_conceded and shots_ot_conceded. In case shots_ot_conceded is 0, we set shot_stop_ratio to 0.
  6. We return the shot_stop_ratio, goals_conceded, shots_ot_conceded and div, which is the code of the league.

This function is ready to use. For example, if we want to calculate the numbers for Man United we will simply call

calc_shotstop('Man United')
# output
(0.7142857142857143, 2, 7, 'E0')

The output means that the shot_stop_ratio for Man Utd was 78%, they conceded 2 goals so far, and they conceded 7 shots on target, 5 of which saved by Onana.

Let’s now calculate the above for all the teams. To do this we simply write

#calc shotstop ratio for all teams
# get all teams
all_teams = set(list(all_leagues['HomeTeam']) + list(all_leagues['AwayTeam']))

# loop on all teams and save the shotstop ratio
shotstop_allteams = []
for team in all_teams:
    shot_stop, goals, shots_ot, div = calc_shotstop(team)
    shotstop_allteams.append((team, shot_stop, goals, shots_ot, div))

We first get the names of all teams by creating a set made of the all the names in the HomeTeam and AwayTeam columns. In this way we are sure to collect all teams that have played home or away this season. We transform this into a set to eliminate the duplicates.

The second step, is to create a list shotstop_allteams that will contain the results of the calc_shotstop function, one entry for each team. Finally, we loop on the all_teams list and invoke the calc_shotstop function, giving it as an input the team name.

We are now ready to visualize these data.

Visualizing the data

The final step before visualizing the data consists in filtering only those teams that have conceded exactly 0 goals.

shotstop_allteams_cs = [x for x in shotstop_allteams if x[2]==0]

Here we save into a new list shotstop_allteams_cs (where cs stands for clean sheet) only the elements of shotstop_allteams that have their third element (the goals conceded) equal to 0.

We can now print shotstop_allteams_cs and we will see the names of the teams that have conceded 0 goals.

shotstop_allteams_cs

# output
[('For Sittard', 1.0, 0, 8, 'N1'),
 ('Arsenal', 1.0, 0, 6, 'E0'),
 ('Nantes', 1.0, 0, 6, 'F1'),
 ('Famalicao', 1.0, 0, 5, 'P1'),
 ('Monaco', 1.0, 0, 5, 'F1'),
 ('Lens', 1.0, 0, 5, 'F1'),
 ('Liverpool', 1.0, 0, 4, 'E0'),
 ('AZ Alkmaar', 1.0, 0, 4, 'N1'),
 ('RB Leipzig', 1.0, 0, 4, 'D1'),
 ('Porto', 1.0, 0, 4, 'P1'),
 ('Heidenheim', 1.0, 0, 3, 'D1'),
 ('Lille', 1.0, 0, 2, 'F1'),
 ('Juventus', 1.0, 0, 2, 'I1'),
 ('Dortmund', 1.0, 0, 2, 'D1')]

If we want to sort the list of the teams above by the ones that conceded more shots we can do the following.

shotstop_allteams_cs.sort(key=lambda x: x[3], reverse=True)

Where the key parameter is a lambda function, to specify that the third element (the number of shots on target conceded) needs to be taken into account as metric to rank the elements of shotstop_allteams_cs.

It’s now time to visualize those data. Below you can see a bar chart, that includes all the teams that have never conceded so far in this season, ordered by the number of shots on target conceded. The teams with the best goalkeepers should appear on the top, as their defenses have conceded shots, but their goalkeepers have been able to stop them successfully. We can see Arsenal with Raya is up top in second place, and Nantes’ Lafont seems also quite a good shot stopper.

Shots on target conceded by clean sheet teams (major Euro leagues)

I have written a few books where I go into the details of how to get the data, visualize and train a model to predict football results for the Premier League, La Liga, Serie A, Bundesliga and the other major European national tournaments, complete with code examples.

Check out the books on

Antonio
Antonio Author of Code a Soccer Betting model in a Weekend and Soccer Betting Coding
comments powered by Disqus