Last year I participated in an inter-office AFL (Australian football) tipping competition. I have always been interested in sports analytics, and it turns out that AFL is one of the most data-rich sports in the world. So this was a perfect opportunity to practice some Python machine learning, and to hopefully win the office sweepstakes!
My machine learning algorithm used historical data on team performance (point-difference) to estimate the relative "strength" of any two teams. For example, let's say I want to compare Team A and Team B this week. My algorithm analysed all recent match-ups between Team A vs Team B, and recorded which team won and by how many points. It then analysed all match-ups between Team A or Team B against a common opponent (Team X). It continued in this manner to produce a number of metrics which arranged teams in some numerical order. The machine learning algorithm then found an optimal weighting for these ordinal metrics which most accurately predicted the "stronger" team.
The algorithm looked great. It was simple in concept and in execution. The only problem ... it wasn't very good at predicting the winning team... And so I finished in last place in the inter-office tipping competition that year 😅.
The problem was that my approach was too simple. The basic (only) unit of comparison was a Team. But teams are a sum of their players. And players' performances vary over time and are dependent on a number of factors: their opponents, their confidence, their performance in previous matches, their level of exhaustion. But data on individual players is big and messy.
What I needed was a convenient way to collect data in a format which Python can easily manipulate. Enter pyAFL.
pyAFL is my attempt to create a Python wrapper around the dataset so painstakingly curated by afltables.com. My aim is to make it easier for Pythonistas out there to get their hands on AFL data, and start concentrating on the analytics - and hopefully do better than me! 😃
The pyAFL API allows you to query historical data on Players and Teams. The data is scraped from afltables.com and converted into structured Python objects and
Pandas dataframes. These can easily be used in
scikit-learn or any other Python analytics tool.
Request caching - All data is scraped from afltables.com. We don't want to make 1000's of requests to afltables.com every time we test and run our analytics script. So, instead I've implemented request-caching into pyAFL. Whenever a web-request is made to afltables.com, the result is cached locally and retrieved on each successive request 👍.
Contributions welcome - this project is open source and a great opportunity for developers participating in Hacktoberfest this year. There are a tonne of improvements to be made, so if you're interested check out the repo at https://github.com/RamParameswaran/pyAFL. If you're new to open-source software, read the CONTRIBUTING.md for a crash course of how to make your first pull-request!