Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

mrmorgan17/PLXG

Open more actions menu

Repository files navigation

Premier League eXpected Goals (PLXG)

This respository contains my efforts to predict how many goals a Premier League team would score in a certain match.

First, I had to create a dataset that could be used to try and predict the number of goals scored in a game. To accomplish this, I webscraped Premier League match data from FBref. The code that I wrote to webscrape from FBref is contained in the FBref_scraper.R and FBref_scraper_basic.R. FBref_scraper_basic.R contains code to webscrape one year's worth of Premier League match data (2019-2020) and FBref_scraper.R contains code to webscrape more than one year's worth of Premier League match data. Pl_team_match_data.csv is the dataset that I was able to create from FBref which contains Premier League match data from the 2017-2018, 2018-2019, and 2019-2020 Premier League campaigns. This dataset contains 103 variables per team per match. Information about what each of the variables are can be found within FBref_scraper.R and/or on FBref.

The PLXG_modeling.R script contains my code and attempts to create a model best suited to predicting number of goals scored in a game by a Premier League team, or expected goals. I tried Poisson regression, random forest, ranger, xgboost models. Inital models were built using all 102 possible explanatory variables, and I found that the best model was an xgbTree model built using the caret library. I was able to improve the xgbTree model performance by using only the 10 most "important" variables instead of all 102. The performance metric I used to measure my model's predictions was Root Mean Square Error (RMSE). Also, my models were built using an 80/20 split of train/test data.

Finally, the PLXG App folder contains the optimal model, PLXGModel.RData, a dataset with only the 10 most "important" variables of match data for each Premier League team, Full_PL_10.csv, and the code for my Shiny application, app.R. This app allows a user to walkthrough my process of getting data and building an expected Goals model and predict for themselves with the model. Users can select a team and enter their own inputs for the variables used to build the model to see how expected goals for a match would change given different values. Users can also select a certain match to see what the predicted goal output was for a certain Premier League team against a certain opponent on a certain date from the 2017-2018, 2018-2019, and 2019-2020 Premier League campaigns. Additionally, users can look at visualizations of the variables for each Premier League team to get a better idea of what values were realistic for teams.

Here is the link to the PLXG app. It is hosted on https://www.shinyapps.io.

About

Predicting match eXpected Goals (XG) for Premier League teams

Topics

Resources

Stars

Watchers

Forks

Languages

Morty Proxy This is a proxified and sanitized view of the page, visit original site.