Projects

Deciphering Poorly Formatted Text using Private LLM
Deciphering Poorly Formatted Text using Private LLM

Process unstructured text data from document files efficiently, prioritizing data privacy with the implementation of locally hosted private large language models. Employed a PEFT model (LoRA) to optimize finetuning for the summarizer and classifier, yielding improved performance metrics with the help of prompt engineering and LLM reasoning.

Apr 25, 2023

Patient Severity Prediction - Mutlivariate Time Series Data
Patient Severity Prediction - Mutlivariate Time Series Data

Enhance hospital decision-making in patient care by constructing a predictive model using time-series data and radiology reports to forecast intracerebral hemorrhage outcomes. The XGBoost-based model achieved an accuracy of 74.11% and precision of 75.51%, demonstrating its capacity to identify high-risk patients. Data collection was sourced from PubMed.

Apr 20, 2023

Comparison of Summarization Techniques Using Language Models
Comparison of Summarization Techniques Using Language Models

Explored various NLP tasks and the impact of transformer-based models on text summarization, focusing on BERT, GPT-2, and T5. Compared different summarization approaches, including DistilBART, Facebook BART, and Conversational BART, to generate effective summaries of medical reports. Addressed challenges related to input size constraints and domain-specific vocabulary.

Mar 28, 2023

Web scrapin - Higher Education Funding Offers
Web scrapin - Higher Education Funding Offers

This project is focused on addressing funding challenges for international students in US higher education. Developed a web scraping pipeline using tools such as requests, Selenium, BeautifulSoup, pandas, and re to extract extensive information on professors from CSRank rankings who are looking for new students.

Mar 20, 2023

Cryptic Species Classification Using SVM
Cryptic Species Classification Using SVM

Developed a model using Support Vector Machines (SVMs) to classify two cryptic species of shellfish, achieving 97% accuracy. Explained SVM concepts and mathematical formulations, including handling complex data with slack variables and kernel tricks. Highlighted the effectiveness of SVMs in species classification without genetic analysis.

Mar 10, 2023

Data Preprocessing for Kaggle Dataset
Data Preprocessing for Kaggle Dataset

Preprocessed a Kaggle dataset, employing various data cleaning methods and feature engineering techniques. Implemented strategies like dropping NaN values, one-hot encoding, and median imputation, resulting in an average accuracy of 0.76. Enhanced features from Name and Cabin variables, leading to improved model performance, with SVM accuracy reaching 0.81. Highlighted the importance of feature engineering and effectiveness of ensemble models.

Jan 25, 2023

NYC Air Quality Data Analysis
NYC Air Quality Data Analysis

Analyzed NYC air quality data focusing on pollutants like Nitrogen dioxide, Sulfur dioxide, Ozone, and PM2.5. Demonstrated a positive trend in air quality improvement over the past decade, with significant reductions in Nitrogen dioxide and PM2.5 emissions. Employed data visualization techniques to highlight the impact of emission norms and regulations, and emphasized the importance of maintaining good air quality and continuing pollution reduction efforts.

Jan 10, 2023

News Website with a Recommendation System
News Website with a Recommendation System

Built a platform that displays stories using the Hacker News API and suggests the most recent stories based on user preferences for older stories. Implemented a LSTM based binary classifier on story meta data to predict user preference.

Dec 12, 2022

Image Style Transfer with VGG Network
Image Style Transfer with VGG Network

Implemented the image style transfer methodology using convolutional neural networks. Utilized a pretrained VGG 19 network to extract the content and style representation from different layers of two images to create a new target image.

Dec 10, 2021

Sequence to Sequence Stock Time Series Data Prediction Model
Sequence to Sequence Stock Time Series Data Prediction Model

Developed LSTM-based seq2seq model for stock prediction using Alpha Vantage API and PyTorch. Implemented various functionalities including CLI for stock prediction, data plotting, model saving, and integration with custom datasets. Achieved accurate predictions and provided extensive documentation and usage guidelines.

Apr 15, 2021