Python fantasy football machine learning
With the American football season fast upon us, there are plenty of folks here at DataRobot who are busy gathering historical football stats and brushing up on their models in anticipation of fantasy football. In this post, we asked four of our hardcore fantasy experts to share how they geek out by applying data science to football.
Ben is a data scientist and a former professional basketball player. Ben has written on the topic of evaluating player performance in the NBA see his blog post here. And, when it comes to football, Ben takes a similar approach:. It all begins with gathering historical football data. Ben uses fantasydata NFL as one source of historical player statistics.
If he is playing DraftKings, he would use the data from fantasydata NFL to predict the number of DraftKings points each player will score by setting that as the target in his DataRobot project. This idea here is to build a predictive model that understands the relationships between our historical statistics and DraftKings fantasy points.
The features created include a set of simple historical lags like means, standard deviations, medians, minimums, and maximums. These features are created over many different time periods for each statistic. The ability to make predictions for the future is how you can utilize the top model to help predict how players will perform next week. DataRobot also automatically engineers more complicated time series features, like Bollinger bands and rolling entropy.
DataRobot will then build and test many models and sort each model by predictive accuracy using out-of-sample validation data. Once modeling is complete, each model is immediately available to make predictions on new data. Ben uses a little secret sauce when he builds models by adding text data. DataRobot includes a text mining engine, which automatically extracts predictive information from unstructured text.
Adding some player fantasy news to your data will add valuable information. Finally, Ben also includes other sophisticated data in the form of Las Vegas and Fantasy Sports site predictions. These types of data rely on highly sophisticated analytics and other modeling techniques. Try experimenting with these and other data to quickly see if your models improve. Taylor is a data scientist with DataRobot University.
And, depending on the format, you may need to rank how confident you are about your decision. As the weeks progress, you earn points based on how many games you picked correctly and at what confidence level. The trick to gaining an advantage is to mitigate your risk better than everyone else. Using publicly available information about the point spread for a given game, Taylor manually builds models to determine how to best rank these games in a risk-averse way.
With the opportunity to use DataRobot this past year, he turned his more manual process into an automated machine learning workflow.Fantasy football season approacheth.
Your heart longs to analyze the scoring distribution in your league by week, by team, by player — to finally quantitatively question the predictive power of projected points — to confirm your hypothesis that you got an unfair slate of opponents in the pre-playoff weeks … and yet you know not how.
Copy-paste data from a webpage? Do some expert-level web scraping? So you can skip the hassle and just use this excellent work. Import the requests package. Initialize a dict called scores to hold score information. Loop over weeks The GET request above, with parameters, is essentially equivalent to if you entered the following URL into a browser:.
It is worth poking around this nested collection of information. To extract the first matchup of week 1, we would do scores['scoreboard']['matchups']. To extract the home score for this matchup, we would index deeper and call scores['scoreboard']['matchups']['teams']['score'].
To make a clean table of all the team IDs, names, and scores for all weeks, we can do. The matplotlib inline is some magic to get inline plots in a Jupyter notebook, omit if you are working in another setting. Now we do some plots. A few stories here: high scorers are unsurprisingly in higher standing than low scorers.
But playoff performance absolutely does matter for the playoff teams in this case, top 4 — in fact, Player D entered the playoffs as top seed and finished 4th. Player E had the best playoff performance but had too many mediocre games in the regular season. All tales as old as time. I thought about redoing the above process in R, but realized DTDusty already did it better: check out his blog over here.
All the desired info will pop up. EDIT2: It was: check out the follow-on posts on how to get boxscoresand then how to deal with private leagues.
Check it out. Tidyverse pipes in Pandas Teaching R in a beginner data science class.GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.
If nothing happens, download GitHub Desktop and try again. If nothing happens, download Xcode and try again.
Using Monte Carlo Tree Search for your Fantasy Football draft
If nothing happens, download the GitHub extension for Visual Studio and try again. FanDuel Inc. Typically, you will have a set salary cap and from that salary cap you can spend money on players to set a fantasy lineup. The most common lineup format is as shown in Table 1 below. Once lineups have been set, fantasy teams gain points via actual NFL football game statistics. For example, typically a running back will receive 1 point for every ten yards rushing in a game.
Different leagues have different point settings. For the analysis conducted in this report we assume standard PPR league scoring. In a PPR league a player is awarded 0. Data from multiple sources were used. NFL player stats are available for all games from through For the models created in this report, 50 different statistics were used. Example R scripts have been uploaded on Canvas. Player data has been scraped for the season and for weeks of the season. The final data source is projected fantasy player data for the top 50 players at each position excluding team defenses.
A python script was written to join the data for all players for all weeks in and The appropriate python scripts have been uploaded to Canvas. Joining the data was not a completely straightforward process as each data set had different player identification numbers, some players have similar names, and team names were not always abbreviated consistently.
Several verification steps were taken to make sure that the data was joined in a consistent manner. Fifty different statistics for each player are included in the model. Intuitively, one would expect that past good performance would dictate future good performance. Examples of statistics are passing yards, passing touchdowns, interceptions thrown, rushing yards, receptions and fumbles lost. Binary indicator variables were created for several additional features. For a given player for a given week, dummy variables were created to indicate whether the player is playing the game at home or away.
Intuitively, one would expect a player to perform better during home games. The quality of the team that a player is playing will influence the amount of points scored. Several other variables were considered for the model, but not included due to lack of availability or impracticality of using the data. Conversely, if a star defensive player is missing on the opposing team, it is likely that an offensive player will have increased points for the week.Note: We started a new blog where I will be running the updated algorithm and posting updates on how my team is doing throughout the 19—20 season here:.
Our research started by pulling the most up-to-date player stats from the Fantasy Game API and running statistical analysis on all the EPL teams and all individual players using Python. Key questions our analysis aims to answer:.
Note: This project was executed on Nov 14th,after Gameweek 10, so all the data and tables presented in the article are accurate up to that date. We can clearly see that in general there is a linear correlation between how well a team is doing in the English Premier League and the cumulative fantasy points of its players. This also exposes a few of the teams that would be considered a bad investment such as — Tottenham, Arsenal, Manchester United, Fulham, Huddersfield, West Ham, and Southampton.
This will help us identify the teams that have too many expensive and under-performing players who rarely play the full 90 minutes each game due to the frequent squad rotation their coach employs, which makes them a bad investment in the long run, since they will not be generating fantasy points consistently each game. Also, this graph below will help us identify the teams with coaches that do not use frequent player rotation, which results in those teams having a more consistent core of regular squad players.
That will inform our algorithm to pick more players from those teams since their players are expected to generate a higher aggregate ROI in the long run because they would be involved in a lot more game action on average compared to players from teams that use frequent player rotation.
That means that most of Wolverhampton's players are undervalued compared to their performance and that the coach uses the same 11 players on a regular basis and only uses subs towards the end of the game or when a regular team player gets injured.
Even teams like Manchester City, Liverpool and Chelsea are in this category with 13—14 regular squad players, which means that picking players from any of the teams above is a good investment in the long run because the regular squad players play more minutes on average compared to players on the bench. After identifying which teams yield a higher cumulative ROI, we then zoomed in on the individual players.
In stock market terms, we have identified all the high-yield market sectors — the teams — and now we want to start analyzing all the individual stocks in each sector — the players.
The plan is to isolate a list of players with the highest ROIs and write a Python algorithm that will use smart logic to pick the most optimal combination of players, which will generate the highest return on investment for our limited budget of MM.
Looking at the scatter plot of Player Cost vs.
Python for Fantasy Football – Introduction to Machine Learning
Player Total Fantasy Points above, we would want our AI to pick players who appear as west-north as possible on the plot players of low cost who generate a lot of Fantasy Points.
Note that we would also want to include some of the top players from the east-north corner of the plot since these would be some of the star league players who generate a lot of points, and even though they are a bit expensive, they still end up with a decent ROI. The graph below plots the Top 20 ROI vs. In the Pie Charts below, we can see a distribution of the teams with the most-overpriced players versus the teams with the most undervalued players.
We are expecting our final algorithm to pick players from a variety of teams that have a lot of high-yield players such as — Bournemouth, Wolves, Liverpool Chelsea, Manchester City, Watford, and Everton.
To understand the logic of our Algorithm one must first understand the rules and constraints of the EPL Fantasy Game below:. So, we started our python algorithm with an if-else statement for these conditions and then added our own conditions and logic on top of that, so that each time that the algorithm loops through our list of players, it can use smart logic to make a valid pick guided by our conditions below:.
Here is some of our condensed Python Code for the team picking algorithm:. Below you can see a screenshot of the final team that our Algo picked:. Note: The team below is only accurate up to Nov 14th, Note: We wrote a similar algorithm for the AVG Joe team, which focuses more on spending the budget on star players from big teams, who are often overpriced and might not return the highest cumulative ROI for our limited budget of M.
Note: We also asked a classmate to pick a random team of his own, so we can compare his picks and verify that our random team-selector function for the AVG Joe algorithm is accurate. The final results showed that our team scored a total of points vs. The bar-plot below demonstrates our results:. Did it beat the others by a significant margin?
Did our Algorithm successfully pick players from some middle of the table teams, which we initially identified as undervalued?
Did the AVG Joe Algorithm and our classmate pick more of the expensive overpriced players from the top teams?Python Tutorial for Beginners [Full Course] Learn Python for Web Development
Below we can see that our Algorithm picked a combination of players from most of the high ROI teams that we identified at the beginning of our project:.Python is the new excel for fantasy football analysis, allowing you to analyze player, team and league stats. Our book will be starting you with Python from square one.
Next, you'll learn the Python data libraries like Pandas and Seaborn that allow you to create data visualizations. Our book will walk you through each library step by step! By the end of our book and tutorials, you'll be able to create data visualizations that'll help you make calculated waiver wire pickups and data-informed draft picks. Learn the basics of Python Python is the new excel for fantasy football analysis, allowing you to analyze player, team and league stats. Learn Python's data analysis libraries Next, you'll learn the Python data libraries like Pandas and Seaborn that allow you to create data visualizations.
Create data visualizations By the end of our book and tutorials, you'll be able to create data visualizations that'll help you make calculated waiver wire pickups and data-informed draft picks. With our tutorials and book, you'll learn how to create data visualizations with Python to inform your fantasy football decisions.
You'll how to do all the stuff listed below and more! Intro to Machine Learning and Fantasy Football. Learn Python with the NFL.
Scraping Fantasy Football Data.The code in this post is also available as a Jupyter notebook. Two more months till the next American Football season kicks off, which means Fantasy Football players around the world are preparing for their upcoming league draft. In this post we will use the Monte Carlo Tree Search algorithm to optimize our next pick in a typical snake draft. We will focus on the Python 3 implementation of the draft logic mainly, and leave the details of the algorithm to the references in the text while still providing the code.
Each competitor picks the real-world football players that will make up his initial roster. A bad draft can really ruin your season, so a lot of study is spent on picking the right players for your team. In most leagues picking is done in turn by snake order 1 to 10, then 10 to 1, etc. This means filling the right roster position at the right time is a big part of your strategy. If most of your competitors start by picking wide receivers, should you follow the crowd or go against it?
That is exactly the question we want to answer with our algorithm. But first the draft logic. We will need an object that describes the exact state of our draft. We can then value each position in the lineup based on some weighing. The weights are essential since we cannot use the same lineup of players each week of the season due to bye weeks, injuries or strategic decisions during the season.
Based on some experimenting, the following weights look okay. A roster can now be valued by mapping each player — starting with the first pick and moving down— to the highest available weight for his position. As a detail, we will give less priority to the flex position in case of a tie. If any weights at the end are not mapped to a player on the roster, the average value of the top three free agents of the corresponding position is mapped to that weight instead — kind of like streaming that position.
The code looks as follows. Note that this method contains a lot of assumptions. Most notably it is purely based on season projections and ignores things like co- variance or strength of schedule. Also, it only looks at the value of each roster in isolation, not in comparison to one another.
These assumptions should be studied at a later point. In theory one can pick any free agent available, but fortunately we can simplify things considerably. Since our valuation is only based on season projections per player, it does not make sense to pick a free agent with a lower projection than another free agent on the same position. This limits the available moves to picking a position and then picking the most valuable free agent on that position. Furthermore, due to the weights used in the valuation, it does not make sense to draft three quarterbacks for example the third one would not add any more value.
In practice, the following limits per position seem reasonable. This results in the following code. The last piece of logic concerns updating the draft state after each pick. The code is straightforward. Finally, our algorithm requires a Clone method so multiple simulation runs do not interfere with one another. Monte Carlo Tree Search is a heuristic search algorithm to find the best next move in turn based games where it is practically impossible to consider all possible moves and their ultimate result.
Since our snake draft is also turn based with lots of possible moves at each turn, MCTS seems like a great choice to beat our competitors. The basic idea behind the algorithm is to simulate lots of game plays — first by picking random moves, and then converging to the best move by focusing on the most promising moves. It balances on a trade off between exploration finding more promising moves and exploitation focusing on the most promising moves.Search Search The developerWorks Blog.
Every week, team managers select their starting roster to compete against an opponent within their league. The objective is simple: select players that have the highest likelihood of scoring the most fantasy points by accruing touchdowns, yards, receptions, etc.
Using the ESPN Fantasy Football API (in Python)
The avalanche of options supported by the vast amount of available data is at once intriguing and daunting. Millions of words printed daily in articles and blog posts? These articles contain valuable information but tracking them all on a daily or even weekly basis requires more time than a 24 hour day allows. Enter IBM Watson. We have trained Watson with the ability to read, comprehend, and interpret millions of news articles and social content. First, Watson must be trained to understand the domain of fantasy football.
A combination of human annotators, data scientists, and developers took Watson through a series of supervised training based on years of previous fantasy football content including archives of the Internet.
An ontology mapped fantasy football unstructured data to 12 entity types and relationships. A group of human annotators used Watson Knowledge Studio to markup and associate thousands of textual phrases to entities. A statistical machine learning entity model was trained from the labeled data and published to Watson Discovery.
Now Watson was able to read and understand millions of news articles within the context of fantasy football. Watson found the most relevant entities, keywords, and concepts of every article that was then sent to a pipeline for comprehension. The machine-learning pipeline was developed to enable Watson to comprehend the unstructured text.
Over 90 gigs of raw text from historical fantasy football seasons were ingested into a document 2 vector model. A second more precise document 2 vector model was created from several thousand lines of definitions and football encyclopedias.
Both of the document 2 vector models were merged together so that the textual representation of hundreds of articles could be converted in numerical vector representations of length The performance of the document 2 vector ensemble model was outstanding. The model was able to infer the correct answers in A second and perhaps more difficult test was the keyword test with Now that the raw text has been converted into numerical representation, the pipeline calls a deep learning layer of models.
Several deep neural networks with over 90 layers were trained to determine if a player was going to boom, bust, or play with an injury. The activation functions of the neural networks were a mixture of tanh and rectified linear units ReLU. A technique such as batch normalization was used to increase the speed of the training while dropout nodes helped to prevent overfit. The final layer of all the neural networks was a sigmoid since we converted the problem to a binomial classifier.
Stochastic gradient decent was the optimizer over the binary cross entropy loss function. The boom and bust model selection was difficult due to the imbalanced classes, and was a tradeoff between accuracy and distribution. All four of the models selected a reasonable number of players that were consistent with historical performance while maintaining high enough accuracy for meaningful insights. Each of the deep player classifiers were normalized so that they could be compared.
Finally, a random forest model was trained to map deep classifier confidence to player percentage.