shing yee.

Work Experiences

This page outlines all the past internship experiences I had previously/currently.

Government Agency of Technology of Singapore (GovTech)

Data Scientist Intern

Jun 2024 - Dec 2024

  • Fine-tuned and trained a cross-encoder model using Amazon SageMaker, processing over 2 million samples to detect off-topic localised prompts. Curated and evaluated a 40,000-sample dataset with Google Cloud Platform (GCP) to enhance model accuracy and ensure high-quality data inputs.
  • Benchmarked large language models (LLMs) using Amazon Bedrock Guardrails and Azure AI Content Safety across 800+ samples, identifying key model gaps and implementing targeted improvements to optimise performance and content safety measures.
  • Optimised machine learning models in Python, specifically utilising Scikit-Learn, by applying hyperparameter tuning, grid search, and cross-validation. These enhancements resulted in a 20% reduction in the time officers spent on prioritising company engagements, significantly improving operational efficiency.
  • Developed a Streamlit application to manage data validation and secure file handling, incorporating Amazon S3 presigned URLs for secure access. Automated backend workflows with Amazon SageMaker, delivering a seamless end-to-end solution that simplified data processing.
  • Created a highly reproducible dashboard with Streamlit, integrated with CI/CD pipelines for continuous updates and deployments. Ensured scalability and reliability by leveraging Docker and Kubernetes, while maintaining code quality with Pytest for automated testing.
  • Deployed an innovative Streamlit application that generates LLM-powered call reports customised for officer IDs. By utilising DSPy and LLM-as-a-Judge, optimised prompt generation, reducing the time required to prepare call reports by 60%.

Python

Machine Learning

LLM

PyTorch

Scikit-learn

Streamlit

CStack

Pytest

Git

Docker

Agency of Science, Technology and Research (A*STAR)

Deep Learning Research Intern

Sep 2023 - Dec 2023

  • Collaborated with a multidisciplinary team of scientists to design and deploy a UNET neural network architecture, leveraging deep learning and computer vision techniques for a skin disease diagnosis project. Achieved a segmentation accuracy rate of approximately 90% using Intersection over Union (IoU) metrics.
  • Managed and annotated over 300 high-quality, specialised images using LabelMe, demonstrating expertise in data management; utilised PyTorch for deep learning model experimentation, achieving a 20% improvement in image analysis precision.
  • Optimised the model by tuning hyperparameters, including learning rate and batch size, to maximise accuracy and overall performance, ensuring efficient training and evaluation.
  • Compiled and presented comprehensive research findings to the team, fostering knowledge-sharing and enhancing collective understanding.
  • Facilitated informed decision-making by highlighting the project's broader implications and potential applications.

Python

Deep Learning

Computer Vision

Image Analysis

Data Annotation

LabelMe

PyTorch

UNET

Sysmex

Data Scientist Intern

May 2023 - Jul 2023

  • Engineered a predictive model attaining ~98% accuracy through employing machine learning algorithms, including Decision Trees, Random Forest, and Gradient Boosted Trees, coupled with statistical techniques and pattern recognition using R’s CARET package to streamline model selection, training, and evaluation.
  • Communicated findings and insights to both technical and non-technical stakeholders through advanced data visualisation tools such as ggplot2, Lattice, and Plotly.
  • Spearheaded the implementation of an efficient data pre-processing pipeline, optimising cleaning, transformation, and feature engineering processes, significantly improving project efficiency, and enhancing dataset quality for predictive modelling.
  • Collaborated in a diverse R&D team, establishing effective communication channels with industry leaders for insights, fostering evidence-driven decision-making, and strategic development.

R

Machine Learning

Data Preprocessing

Statistical Techniques

Data Visualisation

Caret

Ggplot2

Lattice

Plotly

Last updated: Dec 2024