Portfolio

Skills honed over time, reflecting versatility and proficiency.

LANGUAGES & LIBRARIES

Python

SQL

SAS

MATLAB

scikit-learn

TensorFlow

Keras

XGBoost

NLTK/ spaCy

DATABASES & BIG DATA

MySQL

PostgreSQL

Amazon Redshift

Snowflake

Mongo DB

Hadoop (HDFS)

HiveQL

Apache Spark

PySpark

MS SQL Server

CLOUD & TOOLS

AWS(S3,EC2,Glue,Athena)

Azure (Basics)

Docker

Apache Airflow

Git

GitHub

Tableau

Power BI

VS Code

Jupyter Notebook

Experience

Providence Health & Services, Data Scientist

Developed and deployed ML models to predict patient outcomes and reduce readmissions by 18%. Built NLP pipelines with spaCy and Transformers for medical text mining. Automated data workflows with Spark and Airflow, and created real-time dashboards using Tableau and Power BI.

BNY Mellon, Data Scientist

Tackled imbalanced fraud data using SMOTE and ensemble models. Engineered features and built models to predict loan defaults. Worked in Spark and AWS environment to scale ML workflows and optimize fraud detection accuracy by 22%.

ManipalCigna Health Insurance, Data Analyst / Data Scientist

Analyzed large health datasets to optimize claim processing and customer segmentation. Built ETL pipelines and interactive dashboards to support business decisions. Conducted A/B testing and modeled financial data to guide pricing strategies.

Johnson & Johnson, Data Analyst

Built and maintained SSIS/SSRS reports and SQL pipelines. Automated financial reports including Profit & Loss, EBIT, and ROIC. Used SAS and SQL for statistical modeling and regression analysis to support business and mortgage analytics.

Education

University of North Carolina at Charlotte, Master of Science

Related coursework:Artificial Intelligence, Visual Analytics, Knowledge discovery in databases, Database Systems.

VIT University, Bachelor of Science

Paper: Privacy-Preserving Searchable Encryption using Blockchain Developed a secure file storage and sharing system leveraging public and private blockchain networks to ensure data integrity and confidentiality. Implemented using JavaScript, JSP, and Servlets, this project highlights a strong foundation in secure system design and blockchain-based data handling—skills that complement data science in privacy-focused applications.

Projects I've Built Using My Skills

Reddit Data Pipeline

Built a cloud-based ETL pipeline to extract Reddit posts, store them in S3, transform with AWS Glue, and query using Athena and Redshift. Orchestrated with Airflow & Celery, containerized using Docker.

GitHub

Netflix-GPT

A React-based streaming app offering personalized movie recommendations using OpenAI’s GPT. Features secure Firebase login, TMDB integration, and a clean, intuitive UI powered by Redux.

GitHub

Text-Image-Generator

Built a high-performance dataset generator using SynthTIGER to create annotated text images for OCR training. Enabled scalable generation with masks, bounding boxes, and multilingual support—boosting model accuracy and speed.

GitHub

Hey folks, I'm Sri Hari a Data Scientist exploring the power of AI and data.

Skills honed over time, reflecting versatility and proficiency.

LANGUAGES & LIBRARIES

DATABASES & BIG DATA

CLOUD & TOOLS

Experience

Providence Health & Services, Data Scientist

BNY Mellon, Data Scientist

ManipalCigna Health Insurance, Data Analyst / Data Scientist

Johnson & Johnson, Data Analyst

Education

University of North Carolina at Charlotte, Master of Science

VIT University, Bachelor of Science

Projects I've Built Using My Skills

Reddit Data Pipeline

Netflix-GPT

Text-Image-Generator

Social Links

Hey folks, I'm
Sri Hari a Data Scientist
exploring the power of AI and data.