top of page

Portfolio

I’m a firm believer that great things happen when curiosity meets hard work. Here, I’ve brought together projects that reflect my journey of learning, growing, and solving real-world challenges with data.

 /  Portfolio

 /  UFC Trends & Fights Outcomes Prediction

Ultimate Fighting Championship (UFC) Trends & Fights Outcomes Prediction

Github's repository:

As an avid MMA (Mixed Martial Arts) fan, I’ve always been captivated by the dynamic nature of UFC events, which inspired me to explore how data analytics could uncover patterns and predict fight outcomes. The objective of this project was to analyze trends in UFC events, fighters, and fight results, while leveraging regression modeling to predict outcomes based on fighter statistics and historical performance.

We programmarly downloaded the data from Kaggle using the Kaggle API. 

Link to the datasets:

OBJECTIVES:

  1. Analyze trends in UFC events, fighters, and fight outcomes over time.

  2. Perform regression analysis to predict fight outcomes or other relevant metrics.

  3. Create actionable insights and compelling visualizations to present findings.​

RESEARCH QUESTIONS:

  1. What trends can be observed in UFC events over time?

  2. Which weight classes and fighters are most dominant?

  3. What are the most common methods of victory?

  4. How do fighter statistics correlate with victory?

  5. Can we predict fight outcomes using regression analysis?

PROPOSED HYPOTHESIS:

  1. The number of UFC events has steadily increased over time, with specific locations hosting more events due to their popularity or regional growth of the sport.

  2. Specific weight classes, such as Lightweight and Heavyweight, have the most fights and victories due to their historical significance and popularity. Certain fighters will have significantly higher win ratios.

  3.  KO/TKO is the most common victory method in the UFC, especially in heavier weight classes.

  4. Fighters with longer reach and younger age are more likely to win due to physical advantages and endurance.

  5. Fighters with longer reach, higher win ratios, and lighter weight classes are more likely to win by KO/TKO.

EXPLORATORY DATA ANALYSIS (EDA)

Before diving into the analysis, I began by exploring the datasets to understand their structure and content. This initial exploration helped identify key variables and insights to guide our analysis. Following this, I preprocessed the data by handling missing values, outliers, and transforming variables to make them suitable for regression modeling.

Question 1: What trends can be observed in UFC events over time?
Hypothesis: The number of UFC events has steadily increased over time, with specific locations hosting more events due to their popularity or regional growth of the sport.

Screenshot 2025-01-02 at 19.38.10.png

This line chart illustrates a significant increase in the number of UFC events held each year. It clearly highlights the growing popularity and expansion of the UFC over time. The upward trend demonstrates how the organization has steadily increased its event frequency, potentially reflecting the rising global interest in MMA.

Screenshot 2025-01-02 at 19.41.29.png

Observation:
The significant increase in UFC events starting from 2005 can be attributed to the sport's growing popularity, driven by enhanced marketing, the rise of high-profile fighters, and increased global media coverage.​This period marked a surge in UFC visibility, particularly in mainstream media, and the sport became more accessible through streaming services. Las Vegas, with its strong sports entertainment infrastructure, remains the top location for UFC events, while Abu Dhabi's increasing presence reflects the sport's expansion into new global markets

​

Question 2: Which weight classes and fighters are most dominant?
Hypothesis: Specific weight classes, such as Lightweight and Heavyweight, have the most fights and victories due to their historical significance and popularity. Certain fighters will have significantly higher win ratios.

Screenshot 2025-01-02 at 19.46.59.png

Observation:

The bar chart displaying the number of UFC fights per weight class supports our hypothesis that certain weight classes, like Lightweight and Heavyweight, dominate in terms of the number of fights. The Lightweight class ranks as the top weight class with the highest number of fights, followed by Welterweight and Middleweight. This trend can be attributed to the historical significance and popularity of these divisions, which often feature fast-paced, action-packed bouts that attract a large fan base. The Lightweight class, in particular, is known for its depth of talent and has consistently been a central focus in UFC events.

​

Screenshot 2025-01-02 at 19.50.23.png

We decided to dive deeper in our analysis and check the top 20 fighters with highest win ratio over the past 5 years

Screenshot 2025-01-02 at 19.53.23.png

Observation:

The bar chart showcasing the top 20 UFC fighters by win ratio over the last five years reveals that the top 18 fighters have nearly identical win ratios. This includes high-profile fighters like Ilia Topuria, Dricus Du Plessis, Khamzat Chimaev, and Islam Makhachev. Their similar win ratios reflect tight competition and dominance within their divisions. This supports the hypothesis that historically significant weight classes such as Lightweight and Welterweight feature a high concentration of competitive fighters with strong, consistent win records.

​

Question 3: What are the common methods of victory?
Hypothesis: KO/TKO is the most common victory method in the UFC, especially in heavier weight classes.

Screenshot 2025-01-02 at 19.57.48.png

Observation:

According to the data, Unanimous Decisions (U-DEC) are the most common method of victory in the UFC, with a total of 2609 victories, surpassing the combined category of KO/TKO Punches (914) and KO/TKO Punch (812), which together account for 1726 victories. While KO/TKO methods are significant, U-DEC clearly dominates as the leading victory method overall. This analysis does not support the hypothesis that KO/TKO is the most common method overall, though further investigation into specific weight classes may reveal a stronger prevalence of KO/TKO in heavier divisions.

​

Screenshot 2025-01-02 at 20.00.01.png

Observation:

The heatmap analysis of the top 10 victory methods across the top 10 weight classes reveals that KO/TKO is indeed the most common victory method in the Heavyweight and Light Heavyweight divisions, aligning with the hypothesis that heavier fighters tend to end fights with knockouts due to their power. However, Unanimous Decision (U-DEC) emerges as the most common victory method overall, especially in the Lightweight and Welterweight divisions, where fighters are more likely to go the distance and have their fights decided by judges. While KO/TKO is prevalent in heavier weight classes, U-DEC dominates the lighter divisions, suggesting that lighter fighters often engage in more tactical, longer bouts, while heavier fighters rely on their knockout power. This indicates that while KO/TKO is significant in heavier divisions, U-DEC is the most frequent victory method across all weight classes.

​

Question 4: How do fighter statistics correlate with victory?

Hypothesis: Fighters with longer reach and younger age are more likely to win due to physical advantages and endurance.

Observation:

Unfortunately, our current dataset lacks the necessary data and variables, such as fighters' statistics. Therefore, we need to find a suitable dataset to effectively address our research question.

​

Luckily, we have found a fighters' statistics data on kaggle. After exploring the new dataset, we found out that our dataset has a significant number of missing values, particularly in the 'reach' column, which constitutes almost half of the dataset. Therefore, it will be challenging to simply remove the missing values.

​

As a solution, I propose calculating the mean for these columns and then replacing the missing values with their respective means.​

​

After preprocessing, we proceed to the data analysis.

Screenshot 2025-01-02 at 20.06.26.png

Observation:

The correlation matrix reveals weak relationships between fighter statistics and the number of wins. Age shows a slight positive correlation with wins (0.18), suggesting younger fighters may have an endurance advantage, partially supporting the hypothesis. However, reach has an even weaker correlation with wins (0.07), challenging the notion that longer reach significantly impacts victory. Weight shows no meaningful relationship with wins (-0.02), indicating that being heavier does not confer a notable advantage. Stronger correlations between attributes like weight and reach (0.48) and weight and age (0.36) highlight that heavier fighters tend to be older with longer reach, but these attributes do not strongly predict success. Thus, while physical advantages like age and reach may play minor roles, skill, strategy, and experience are likely the primary determinants of victory in the UFC.

​

We then try to quantify how each attribute predicts the number of wins through linear regression model!​

Screenshot 2025-01-02 at 20.08.13.png

Observation:

The regression analysis results indicate that the relationship between fighter statistics (reach, age, weight) and the number of wins is relatively weak. The coefficients show that age has the strongest positive influence on wins (0.478), aligning with the correlation matrix observation (0.18) that younger fighters may have an endurance advantage. Reach has a very small positive impact (0.045), suggesting that while longer reach might provide a slight advantage, it is not a decisive factor, consistent with the weak correlation (0.07). Weight has a negligible and slightly negative coefficient (-0.018), further confirming that being heavier does not contribute significantly to winning, as also indicated in the correlation matrix (-0.02). The low R-squared value (0.12) implies that only 12% of the variance in wins is explained by these attributes, highlighting that other factors—such as skill, strategy, and experience—play a much larger role. These findings support the research question by showing limited correlation between physical attributes and victories and partially support the hypothesis, as age has some influence, but reach and weight are not strongly predictive of success.

​

Question 5: Can we predict fight outcomes using regression?

Hypothesis: Fighters with longer reach, higher win ratios, and lighter weight classes are more likely to win by KO/TKO.

Screenshot 2025-01-02 at 20.10.50.png

Observation:

The logistic regression model achieved an accuracy of 85.84%, with a precision of 0.86 and recall of 0.70 for predicting KO/TKO outcomes. The model performed well at identifying non-KO/TKO outcomes (class 0) with high precision and recall, but its ability to identify KO/TKO outcomes (class 1) was weaker, as reflected in the lower recall for class 1 (0.70). This suggests that while the model is relatively effective at predicting non-KO/TKO outcomes, it struggles more with predicting KO/TKO victories. Given the results, the hypothesis that "fighters with longer reach, higher win ratios, and lighter weight classes are more likely to win by KO/TKO" might not be fully supported by the model, especially considering the previous analysis where reach was found to have a weak correlation with fight outcomes. Other factors such as fighter skill, strategy, and experience, which were not included in the model, could play a more significant role in predicting KO/TKO outcomes. For future improvements, we could incorporate more advanced features such as historical performance data, fighter conditioning, or fighting style, and explore other machine learning algorithms like Random Forest or Gradient Boosting to see if they provide better predictive power.

​

The project ends here. You can view the Tableau visualization at the following link: Tableau

To check my other projects: Portfolio

bottom of page