About Me

I just recently graduated with my Masters in Public Policy with Data Science Specialty from the University of Chicago Harris School of Public Policy. During my time at Harris, I took courses in Statistics and Program Evaluation, Python Programming, Linear Algebra, Machine Learning, including Advanced Computer Vision and Deep Learning and NLP, Data Engineering and Cloud Computing and Engineering. Here is a list of Projects that I have completed or am currently working on:

1) For the final project in Data Engineering, I worked with a team of four to build a SQL Database to store the Metropolitan Museum of Art’s extensive artwork database (500,000+ objects). I designed and implemented a comprehensive data schema using relational database management system (RDBMS) MySQL to streamline the organization of the Metropolitan Museum of Art’s objects, enhancing data accessibility and integrity for varied analytical applications. DEP Final Project Repo

2) For my final project in Data and Programming, I completed a sentiment analysis on major US news article titels related to Israel, Hamas and Palestine, focusing on subjectivity and word usage. I utilized NewsAPI.org to collect thousands of news headlines and automated the data extraction and management processes with custom Python scripts, storing the results in a GitHub repository for streamlined access and reproducibility. My Analysis found that Us News Outlets were more likely to describe Palestine and Hamas subjectively in News Article Titles. Additionally, I created an interactive dashboard using Shiny and Plotly, which dynamically visualized the analysis and highlighted trends in the media portrayal of these geopolitical entities. News Sentiment Analysis Repo

3) I worked with two other students on a Final Project in Advanced Computer Vision and Deep Learning, which involved training a classification and segmentation model to identify deforestation in Satellite images of the Amazon Rainforest. I trained a custom U-Net Model for semantic segmentation that achieved 75% training accuracy despite limited computing resources. Additionally, I designed the MLOps pipeline for model deployment and updates in AWS Sagemaker. Identifying Deforestation Repo Segmentation Section

4) For my final project in Big Data and High-Performance Computing/Cloud Engineering, I built a data pipeline and system to analyze NDVI in satellite imagery from Landsat Satelite Images. The process involves creating a geojson polygon for the geo-spatial region, then using EC2 and ThreadPoolExecutor to parallelize the extraction and processing of Landsat image data, storing it in an S3 bucket. I created Lambda functions to automatically update with new images, update the parquet file, and submit Spark jobs for analysis. I then created an AWS Step Function to ensure that each step, from data ingestion to updating the parquet file and performing NDVI analysis using Spark on an EMR cluster, happens in the correct order.

Driven by a profound commitment to public service and intellectual curiosity, my academic and professional pursuits in data science are rooted in a passion for utilizing technology to address complex societal issues. I am particularly motivated by the potential of data-driven insights to influence public policy and create meaningful change. Looking forward, I aim to leverage my expertise in data science to contribute to evidence-based policy making, aspiring to roles that allow me to shape and implement innovative solutions to global challenges. My ultimate goal is to lead initiatives that bridge the gap between data science and public policy, ensuring that technological advancements serve the public good and enhance community well-being on a broad scale.