Shots conversion-Best teams in the top European Soccer Leagues

Shots conversion-Best teams in the top European Soccer Leagues

The 2024/25 season is ongoing in the main European Soccer Leagues, and we can already start seeing some trends in data. Most teams have played 4 or 5 matches, and it’s still too early to make final conclusions, but it can be definitely interesting to look at early trends.

One of the things I have always liked to analyze is how well teams convert chances. In other words, what is the percentage of shots that turn into goals, and which teams are the most efficient at converting chances? To answer this question, I will follow a simple 3-steps approach. For each of the teams in the top European Leagues I will:

  • calculate the number of shots made
  • calculate the number of goals scored
  • get the ratio goals scored/shots made and rank teams from the highest to the lowest

In this way, the teams that are good at converting chances will be ranked at the top, and probably those teams have some of the best strikers around.

In the article, I am going to outline the analysis process, from getting the data, to aggregating, visualizing and ranking the teams. You can skip the part about getting the data if you are already familiar with that.

Getting the data

I have used the data available on football data website. Here you can find the detailed match statistics of all major European leagues (and more). In order to import one league, using pandas you can do:

df_epl = pd.read_csv("https://www.football-data.co.uk/mmz4281/2425/E0.csv")

Ar this point you can have a look at the data inside the file, and it will look something like this

Date HomeTeam AwayTeam FTHG FTAG AC
16/08/2024 Man United Fulham 1 0 8
17/08/2024 Ipswich Liverpool 0 2 10
17/08/2024 Arsenal Wolves 2 0 2

We have quite a big number of interesting statistics here. Apart from the full time goals, we can access shots, shots on target, corners and even yellow and red cards.

The next step is to aggregate the data from all European leagues. In order to do this, we will simply import them in a separate DataFrame first, and later concatenate those together.

df_epl = pd.read_csv("https://www.football-data.co.uk/mmz4281/2425/E0.csv")
df_bundes = pd.read_csv("https://www.football-data.co.uk/mmz4281/2425/D1.csv")
df_liga = pd.read_csv("https://www.football-data.co.uk/mmz4281/2425/SP1.csv")
df_seriea = pd.read_csv("https://www.football-data.co.uk/mmz4281/2425/I1.csv")
df_ligue1 = pd.read_csv("https://www.football-data.co.uk/mmz4281/2425/F1.csv")
df_ere = pd.read_csv("https://www.football-data.co.uk/mmz4281/2425/N1.csv")
df_pri = pd.read_csv("https://www.football-data.co.uk/mmz4281/2425/P1.csv")

Here we have imported the data from the top 7 European leagues (Premier League, Bundesliga, La Liga, Serie A, Ligue 1, Eredivisie and Primeira Liga). So now we can concatenate those into a single big one.

all_leagues = pd.concat([df_epl, df_liga, df_seriea, df_ligue1, df_ere, df_pri, df_bundes])

The all_leagues DataFrame will contain all matches played so far in the 7 leagues mentioned above.

We are now ready to calculate the total number of goals conceded by each team so far, and select only the teams that have not conceded a goal yet.

Calculating the conversion rate

In order to calculate the shots conversion rate of each team, I will first define a function that, given a team name in input, will calculate the number of goals scored by the team, the number of shots made and the ratio of these two metrics. Once I have done that, I can apply that function to all teams in a loop.

The function to calculate the number of goals scored, and shots made by a single team looks like this

def calc_shotacc(team):
    # calculate the rate of conversion of total shots into goals

    team_matches = all_leagues[(all_leagues['HomeTeam']==team) | (all_leagues['AwayTeam']==team)]
    shots_made = 0
    goals_made = 0
    for idx, row in team_matches.iterrows():
        if row['HomeTeam'] == team:
            shots_made += row['HS']
            goals_made += row['FTHG']
        else:
            shots_made += row['AS']
            goals_made += row['FTAG']

    if shots_made>0:
        shot_acc = float(goals_made)/shots_made
    else:
        shot_acc = 0

    div = row['Div']
    return shot_acc, goals_made, shots_made, div

This function takes in input the name of a team team and outputs

  • goals_made: the number of goals scored by the team so far
  • shots_made: the number of shots taken by the team so far
  • shots_acc: the ratio between the number of goals scored and shots taken
  • div: the division the team plays in

The calculation follows these steps:

  1. We slice the DataFrame selecting all rows where the name of the home or away team was the one given as input to the function. We call team_matches this smaller DataFrame, since it contains only matches played by the team we are analyzing.
  2. We initialize shots_made and goals_made to 0.
  3. We loop on the team_matches DataFrame and, in case the team is playing at home, we add to shots_made the value of the column HS. If the team is playing away, we add to shots_made the value of the column AS.
  4. In the same loop, in case the team is playing at home, we add to goals_made the value of the column FTHG. If the team is playing away, we add to goals_made the value of the column FTAG.
  5. We calculate the shot_acc as the ratio between goals_made and shots_made. In case shots_made is 0, we set shot_acc to 0.
  6. We return the shot_acc, goals_made, shots_made and div, which is the code of the league.

This function is ready to use. For example, if we want to calculate the numbers for Man United we will simply call

calc_shotacc('Man United')
# output
(0.07352941176470588, 5, 68, 'E0')

The output means that the shot_acc for Man Utd was 7.3%, they scored 5 goals so far, and they made a total of 68 shots.

Let’s now calculate the above for all the teams. To do this we simply write

#calc shot accuracy for all teams
# get all teams
all_teams = set(list(all_leagues['HomeTeam']) + list(all_leagues['AwayTeam']))

# loop on all teams and save the shotstop ratio
shotacc_allteams = []
for team in all_teams:
    shot_acc, goals, shots, div = calc_shotacc(team)
    shotacc_allteams.append((team, shot_acc, goals, shots, div))

We first get the names of all teams by creating a set made of the all the names in the HomeTeam and AwayTeam columns. In this way we are sure to collect all teams that have played home or away this season. We transform this into a set to eliminate the duplicates.

The second step, is to create a list shotacc_allteams that will contain the results of the calc_shotacc function, one entry for each team. Finally, we loop on the all_teams list and invoke the calc_shotacc function, giving it as an input the team name.

We are now ready to visualize these data.

Visualizing the data

Before visualizing the data, we want to limit the number of teams to the best ones. If we want to sort the list of the teams above by the ones that have the highest conversion ratio we can do the following.

shotacc_allteams.sort(key=lambda x: (x[1], x[3]), reverse=True)

Where the key parameter is a lambda function, to specify that the first element (the shot conversion ratio) needs to be taken into account as metric to rank the elements of shotacc_allteams. In case the shot conversion ratio is the same, the third element (the number of goals) will be used to decide how to rank the tied teams.

Finally, to select only the top 10 teams by shot conversion ratio, we will just slice the sorted list and take the first 10 elements.

shotacc_topteams = shotacc_allteams[:10]
shotacc_topteams
# output
[('Marseille', 0.3, 15, 50, 'F1'),
 ('Nice', 0.22950819672131148, 14, 61, 'F1'),
 ('Bayern Munich', 0.2191780821917808, 16, 73, 'D1'),
 ('Paris SG', 0.21794871794871795, 17, 78, 'F1'),
 ('Celta', 0.20588235294117646, 14, 68, 'SP1'),
 ('Chelsea', 0.2037037037037037, 11, 54, 'E0'),
 ('Santa Clara', 0.2, 10, 50, 'P1'),
 ('Mainz', 0.2, 8, 40, 'D1'),
 ('Verona', 0.2, 8, 40, 'I1'),
 ('Strasbourg', 0.19642857142857142, 11, 56, 'F1')]

It’s now time to visualize those data. Below you can see a bar chart, that includes all the teams above, ordered by their conversion ratio. The teams with the best forwards should appear on the top, as they have converted their chances in the most efficient way so far.

Marseille is way ahead of the rest, Mason Greenwood is having a great start of the season for them, definitely back to their old levels at United. Bayern Munich still high up there, with Kane, Musiala and Olise all involved in goals so far. It’s a bit surprising to see small teams like Verona and Strasbourg there too. Their numbers might be due to the low statistics but let’s keep an eye on them for the rest of the season.

Shots conversion rate (major Euro leagues)

If you liked the analysis and want to know more about soccer analytics applied to the betting markets, I have written a few books where I go into the details of how to get the data, visualize and train a model to predict football results for the Premier League, La Liga, Serie A, Bundesliga and the other major European national tournaments, complete with code examples.

Check out the books on

Antonio
Antonio Author of Code a Soccer Betting model in a Weekend, Soccer Betting Coding and Build a Soccer Betting Model for Euro 2024
comments powered by Disqus