Overview

This project develops an end-to-end machine learning framework for predicting Formula 1 lap times using real-time telemetry data. Through sophisticated feature engineering and statistical analysis, we achieve 94.8% R² prediction accuracy.

Repository: github.com/SidShah2953/F1-Telemetry-Analysis
Affiliation: Boston University, Department of Computer Science
Program: MS in Applied Data Analytics
Date: December 2024

Key Results

Model Performance

  • R² Score: 94.8% (exceptional predictive accuracy)
  • Features: 20+ engineered variables
  • Methodology: Advanced feature engineering with time-series quantization
  • Validation: ANOVA and statistical hypothesis testing

Top 5 Most Influential Features

FeatureImportanceDescription
Track Length56.40%Total circuit distance
Elevation Std Dev14.58%Vertical terrain variation
Total Elevation Change6.23%Cumulative altitude gain/loss
Curvature Std Dev5.17%Track corner complexity
Number of Corners4.16%Total turning points

Top 5 Total Contribution: 86.54% of predictive power

Project Objectives

Primary Goal

Build a robust predictive model that can forecast F1 lap times based on:

  • Track characteristics (geometry, elevation, surface)
  • Driver performance patterns
  • Car telemetry data
  • Environmental conditions

Research Questions

  1. Which track features most significantly impact lap times?
  2. How do driver styles differ in performance characteristics?
  3. Can we predict lap times for new circuits?
  4. What is the optimal feature set for prediction accuracy?

Data Sources

Telemetry Data

  • Real-time metrics: Speed, throttle, brake, steering angle
  • Frequency: High-resolution (millisecond-level)
  • Coverage: Multiple seasons, all circuits
  • Volume: Millions of data points

Track Characteristics

  • Geometric Data: GPS coordinates, corner angles, straight lengths
  • Elevation Profiles: Altitude changes, gradients
  • Surface Data: Track temperature, weather conditions
  • Configuration: Circuit layout, sector boundaries

Driver Performance

  • Historical Lap Times: Race and qualifying data
  • Sector Times: Granular performance breakdown
  • Team Information: Constructor, car specifications
  • Session Conditions: Practice, qualifying, race

Feature Engineering

Track Geometry Features

Curvature Analysis:

  • Average curvature
  • Standard deviation of curvature
  • Maximum curvature (tightest corner)
  • Curvature distribution (quantiles)

Corner Characteristics:

  • Number of corners (total count)
  • Corner complexity score
  • Slow/medium/fast corner distribution
  • Corner entry/exit angles

Elevation Features

Vertical Metrics:

  • Elevation Standard Deviation (14.58% importance)
  • Total Elevation Change (6.23% importance)
  • Maximum gradient (steepest climb/descent)
  • Elevation gain vs. loss

Impact: Elevation changes affect:

  • Engine power delivery
  • Aerodynamic efficiency
  • Tire wear patterns
  • Driver energy management

Track Length & Layout

Distance Metrics:

  • Track Length (56.40% importance - dominant factor)
  • Straight length (longest vs. average)
  • Sector length distribution
  • Track type (street vs. permanent circuit)

Time-Series Quantization

Technique: Discretizing continuous telemetry signals

  • Speed binning (low, medium, high)
  • Throttle application quantiles
  • Brake pressure zones
  • Steering angle categories

Benefits:

  • Captures non-linear relationships
  • Reduces noise in telemetry data
  • Enables pattern recognition
  • Improves model generalization

Driver-Specific Features

Performance Metrics:

  • Historical lap time averages
  • Qualifying vs. race pace differential
  • Tire degradation patterns
  • Consistency scores (lap time variance)

Driving Style Indicators:

  • Aggressive vs. smooth braking
  • Corner entry speed preferences
  • Throttle application patterns
  • Energy management strategies

Statistical Analysis

ANOVA (Analysis of Variance)

Purpose: Determine which features significantly impact lap times

Methodology:

  • F-statistic calculation for each feature
  • p-value analysis (significance testing)
  • Effect size quantification
  • Multiple comparison corrections

Key Findings:

  • Track length: Highest F-statistic
  • Elevation features: Statistically significant
  • Curvature metrics: Strong predictive power

Hypothesis Testing

Null Hypothesis (H₀): Feature has no effect on lap time
Alternative Hypothesis (H₁): Feature significantly affects lap time

Results:

  • Rejected H₀ for top 15 features (p < 0.01)
  • Strong evidence for track geometry impact
  • Validated feature selection methodology

Multi-Dimensional Feature Interactions

Complex Interactions Engineered:

  • Track length × curvature (handling vs. straight-line speed)
  • Elevation change × number of corners (energy management)
  • Driver consistency × track complexity
  • Temperature × tire compound × track abrasiveness

These interactions capture non-linear effects that simple features miss.

Driver Comparative Analysis

Max Verstappen vs. Lando Norris

Statistical Comparison using multiple linear regression models to analyze driving styles and performance characteristics:

Track Complexity Management

Verstappen Advantages:

  • More consistent performance on complex tracks
  • Lower sensitivity to number of corners
  • Lower sensitivity to maximum curvature
  • More efficient adaptation to track characteristics

Interpretation: Superior technical skill in handling varied circuit types

Tire Management

Verstappen Strengths:

  • Lower lap time degradation with tire wear
  • More consistent performance across tire compounds
  • Better tire preservation in race conditions

Impact: Strategic advantage in race simulations and pit stop strategy

Performance Factors

Key Differences:

  • Different sensitivity to track temperatures
  • Varying responses to weather conditions
  • Distinct optimal setup preferences

Application: Team strategy optimization and car setup directions

Machine Learning Model

Algorithm Selection

Model Type: Gradient Boosting Regressor (XGBoost/LightGBM)

Rationale:

  • Handles non-linear feature interactions
  • Feature importance extraction
  • Robust to outliers
  • High predictive accuracy for tabular data

Model Architecture

Input Layer: 20+ engineered features Training Strategy: K-fold cross-validation Optimization: Hyperparameter tuning (Grid Search) Validation: Hold-out test set + temporal split

Feature Importance Analysis

Method: SHAP (SHapley Additive exPlanations) values

Insights:

  • Track length dominates (56.40%)
  • Elevation features collectively contribute 20.81%
  • Curvature metrics add 9.33%
  • Driver-specific features: 8-10%

Technical Implementation

Python Stack

Core Libraries:

  • Pandas: Data manipulation and time-series operations
  • NumPy: Numerical computations
  • Scikit-learn: Machine learning models and metrics
  • XGBoost: Gradient boosting implementation
  • Matplotlib/Seaborn: Data visualization

Statistical Analysis:

  • SciPy: ANOVA, hypothesis testing
  • Statsmodels: Regression analysis
  • SHAP: Feature importance interpretation

Data Pipeline

  1. Data Ingestion: Load telemetry and track data
  2. Feature Engineering: Create 20+ derived features
  3. Time-Series Quantization: Discretize continuous signals
  4. Data Normalization: Standardize feature scales
  5. Train/Test Split: Temporal and circuit-based splits
  6. Model Training: Hyperparameter optimization
  7. Evaluation: R², RMSE, MAE metrics
  8. Interpretation: SHAP analysis and visualizations

Code Structure

# Simplified workflow
1. load_telemetry_data()
2. engineer_track_features()
3. quantize_time_series()
4. build_driver_features()
5. train_model(features, target=lap_time)
6. evaluate_performance(test_set)
7. analyze_feature_importance()
8. visualize_predictions()

Applications

Racing Teams (F1 Constructor Applications)

Strategy Optimization:

  • Lap time predictions for circuit planning
  • Driver-circuit matching analysis
  • Tire strategy simulations
  • Pit stop timing optimization

Car Development:

  • Understanding which car characteristics matter most
  • Prioritizing aerodynamic vs. mechanical grip
  • Optimizing for specific circuit types

Sports Analytics

Broadcasting & Media:

  • Real-time prediction graphics
  • Performance comparison visualizations
  • Insightful commentary support

Fantasy Sports:

  • Driver performance forecasting
  • Optimal team selection
  • Risk assessment for picks

Data Science Showcase

Transferable Skills:

  • Feature Engineering: Complex interaction terms
  • Time-Series Analysis: Quantization techniques
  • Statistical Rigor: ANOVA, hypothesis testing
  • Model Interpretation: SHAP analysis
  • Domain Expertise: F1 racing knowledge integration

Insights & Discoveries

Track Length Dominance (56.40%)

  • Longer tracks = longer lap times (obvious but quantified)
  • Serves as baseline normalization factor
  • Other features explain residual variation

Elevation Impact (20.81% combined)

  • Underestimated factor in lap time prediction
  • Affects engine load and aerodynamics
  • More important than raw corner count

Curvature Complexity (9.33%)

  • Standard deviation more important than mean
  • Track-to-track variation matters
  • Technical circuits favor skilled drivers

Driver Differences (Verstappen vs. Norris)

  • Quantified performance gaps on complex tracks
  • Tire management measurably different
  • Temperature/weather sensitivity varies

Challenges & Solutions

Challenge 1: High-Dimensional Data

Solution: Feature selection via ANOVA and recursive feature elimination

Challenge 2: Non-Linear Relationships

Solution: Time-series quantization and interaction terms

Challenge 3: Driver Heterogeneity

Solution: Driver-specific features and mixed-effects modeling

Challenge 4: Overfitting Risk

Solution: Cross-validation, regularization, and temporal splits

Future Enhancements

Planned Additions

  • Real-time predictions: Live race lap time forecasting
  • Strategy simulation: Pit stop and tire strategy optimization
  • Weather integration: Rain impact on lap times
  • Machine learning ensemble: Combining multiple models

Advanced Features

  • Tire compound effects (soft vs. medium vs. hard)
  • Fuel load degradation curves
  • DRS (Drag Reduction System) impact
  • Traffic and overtaking difficulty

Deep Learning Exploration

  • LSTM for sequential telemetry data
  • CNN for circuit image analysis
  • Transformer models for attention-based predictions

Conclusion

This project demonstrates end-to-end data science expertise in a complex, real-world domain:

  • 94.8% R² accuracy through rigorous feature engineering
  • Statistical validation via ANOVA and hypothesis testing
  • Interpretable models with SHAP feature importance
  • Domain integration combining F1 knowledge with ML techniques

The framework is transferable to financial modeling (trading strategies), sports analytics (performance prediction), and any time-series regression problem requiring sophisticated feature engineering.

Key Takeaways

  1. Track length is the dominant predictor (56.40%)
  2. Elevation variation significantly impacts lap times (20.81%)
  3. Driver characteristics create measurable performance differences
  4. Advanced feature engineering unlocks predictive accuracy

Repository

GitHub: SidShah2953/F1-Telemetry-Analysis

Contents:

  • Complete data pipeline code
  • Feature engineering notebooks
  • Statistical analysis scripts
  • Model training and evaluation
  • Visualization tools
  • Documentation and results

This project showcases the intersection of machine learning, statistical analysis, and domain expertise—skills directly applicable to quantitative finance and data-driven decision making.