All Projects

Data Science Portfolio

Home Resume Projects

About Me

Card image

Sergey Mouzykin

B.S. Mechanical Engineering

Data Science
Machine Learning
Python

linkedin github github
Feel free to reach out or follow

Compilation of personal and online courses projects. This portfolio, as a whole, aims to demonstrate proven experience in Data Science principals including obtaining/cleaning data, building Extract, Transform, Load, (ETL) pipelines, Exploratory Data Analysis (EDA), and building and validating Machine Learning models.

Predicting Building Energy Use

Feb 2020 – Mar 2020

This competition aims to build counterfactual models to predict buildings’ energy usage. A successful model should scale well and minimize the Root Mean Squared Log Error. Counterfactual models are estimates of energy usage before any improvements are made within the building. This estimate is then compared with the actual energy usage after the improvements to calculate energy usage and confirm that the improvements are in fact working.

Article Recommendations

Jan 2020

This project focuses on analyzing interactions between users and articles on the IBM Watson Studio platform. New article recommendations are made to users based on their interactions with articles. Based on the data available, we can use various methods to make these recommendations. The methods used here are Rank Based, Collaborative Filtering, and Matrix Factorization.

Disaster Response Pipeline | Web-App

Nov 2019
  • Utilized frameworks such as NLTK and Scikit-Learn to perform ETL, build ML pipeline, and deploy ML model to a local web application.
  • The ML pipeline processes 26,000 raw text messages using NLTK and Scikit-Learn to build a multioutput classification model.
  • Maximized F1 score through feature engineering and parameter tuning.

Solar Array Cost Prediction | Medium Blog

May 2019 – Jul 2019
  • Analysis revealed a trend of declining national average installation costs of about 30% since 2009 peak.
  • Predicted the cost of a residential solar panel array using a ML pipeline built using Scikit-Learn.
  • Evaluated multiple regression models (Ridge, RandomForest, GradientBoosting) and optimized the pipeline to minimize the RMSE and MAE.

Data Dashboard

Oct 2019

Dashboards developing using Flask, Plotly, Pandas and NumPy. Using data acquired from WorldBank, it is cleaned and transformed into usable form using Pandas framework. Flask is used to create a HTML template for visualizations using Plotly.

Facial Recognition with OpenCV

Apr 2019 - May 2019

Scans over a newspaper article and images looking for occurrences of a specified keywords and detecting faces. OpenCV is used to detect faces, tesseract to perform optical character recognition and PIL to put together resulting images onto a new contact sheet.