
Introduction
Growing up, I loved watching sports such as baseball, basketball, and hockey, and always cheered for my favourite athletes Jeremy Lin and José Bautista. Over time, I developed a deep appreciation for the metrics and statistics in professional sports leagues. As I invested interest in more and more sports, the process of scrolling, swiping, and clicking on several platforms to track the statistics of teams and athletes became more and more tedious.
Objective
Create a centralized application and visualization tool for fantasy sports with Python and Tableau. My tool will retrieve statistics of specific, chosen sports teams and athletes across any professional sports leagues such as NHL, NFL, NBA, MLB. Ultimately, my tool will enable simplified and personalized sports tracking.
Approach
- Data Collection and Storage
- User Interface and Customization
- Data Visualization and Exploration
In this initial phase, the focus will be on researching and identifying reliable data sources for sports statistics. A robust data collection and storage mechanism will be developed using Python, ensuring accuracy and consistency of the data. This phase will involve careful selection of sources to ensure the data quality aligns with the project’s goals.
The project will shift its focus to designing a user-friendly and intuitive user interface. This phase involves implementing user customization features, such as the ability to select favorite teams, sports, and statistics. A responsive layout will be developed to ensure compatibility across various devices, offering a seamless experience for users on desktop, tablet, and mobile devices.
Next, I will be dedicated to creating interactive visualizations and dashboards using Tableau. This phase aims to empower users with the ability to explore data through filtering, sorting, and aggregation. Visual aids like charts, graphs, and maps will be incorporated to enhance data comprehension and provide clear, actionable insights to the users.
By leveraging Python and Tableau, the tool will provide sports enthusiasts with a personalized and convenient way to track their favorite teams and athletes across multiple professional sports leagues. The final solution will seamlessly integrate data collection, user customization, and dynamic visualizations to create a powerful tool for sports fans.
Challenges and Lessons Learned
Challenge #1: Selecting the right storage medium
Strops was a personal project; I wanted to challenge myself and use a new platform to learn about databasing techniques. I used PostgreSQL, a free open-source platform. However, other options included MS-SQL, MongoDB and other non-relational databases, cloud providers like GCP and AWS, but as a start, I wanted to learn about maintaining a relational database.
Challenge #2: Ensuring data has been verified.
Thirdly, the veracity of data was in question as well as there were many websites and APIs that I could have retrieved my data from. For example, I could pull from APIs provided by the leagues, web-scrape using the Beautiful Soup library, or even tap in to existing sports databases. However, how would I know that the statistics match up across all platforms or if they were even correct?
Outcomes and Next Steps
- Streaming Processes with Kafka
- Re-Factoring Scripts for Cloud Integration
- Implement Advanced Analytics & Machine Learning
- Create an Automated Reporting System
The next step involves integrating real-time data streaming processes using Apache Kafka. This will allow for the continuous ingestion and processing of live data, ensuring that the analysis remains up to date and responsive to real-time changes, such as live sports statistics or other real-time metrics.
I will also focus on refactoring the existing scripts to be cloud-compatible, optimizing them for scalable cloud environments like AWS or GCP. This will involve adjusting the architecture for distributed processing and storage, allowing the system to efficiently handle increased data volume and provide more flexible, scalable solutions.
Integrating machine learning models for predictive analytics can offer deeper insights. For example, predicting future trends based on historical data (e.g., predicting player performance or game outcomes), or identifying hidden patterns in the data through unsupervised learning methods, could add significant value to the analysis.
Set up an automated system that generates and sends out reports based on data analysis. This could be helpful for stakeholders or users to receive regular insights without manual intervention. Incorporating natural language generation (NLG) could create human-readable summaries of insights, making the information more accessible to non-technical users.