
Problem
In urban environments, personal safety is a major concern for residents, especially during commutes or when visiting unfamiliar areas. Despite the availability of safety statistics and real-time alerts, there is a lack of accessible tools that enable individuals to make informed decisions about their routes or locations based on safety. Current solutions often fail to integrate live data, predictive analytics, and user-specific preferences, leaving users without actionable insights to mitigate risks effectively.
Approach
The development of safeTO follows a phased methodology to ensure a robust and insightful platform:
- Data Extraction
- Data Transformation
- Data Loading and Analysis
- Predictive Modeling and Insights
Data is collected from the Toronto Police and the Toronto Open Data Portal, focusing on critical statistics for neighborhoods across the city. This phase emphasizes sourcing accurate, up-to-date information essential for meaningful analysis.
Extracted data is cleaned, structured, and prepared for analysis using Python's pandas library. This involves normalizing formats, handling missing values, and creating datasets that are both readable and suited for generating insights.
The transformed data is loaded into dataframes to facilitate in-depth analysis. Techniques such as feature engineering, heat maps, and correlation analysis are employed to uncover patterns and relationships. These steps highlight key factors influencing neighborhood safety.
This phase focuses on applying predictive models to identify trends or determine why some areas are safer than others. These insights can inform users about factors impacting safety and provide recommendations for risk mitigation.
Challenges and Lessons Learned
- Data Cleaning
- Finding Appropriate Data
- Selecting the Correct Storage Medium and Database
Ensuring data consistency and accuracy proved challenging due to missing entries, inconsistent formats, and irrelevant information. Developing an efficient cleaning pipeline was essential to maintain the integrity of the analysis.
Identifying relevant, high-quality datasets was another challenge. Not all data sources contained the level of detail or reliability required for meaningful insights, making it crucial to vet and curate information carefully.
Determining the optimal storage solution for the project's data needs was a key learning experience. Balancing accessibility, performance, and scalability led to valuable insights on database design and management for data-intensive applications.
Outcomes and Next Steps
Still in progress!