Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings
View drnsmith's full-sized avatar
💭
I may be slow to respond.
💭
I may be slow to respond.

Block or report drnsmith

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
drnsmith/README.md

Hi 👋, I'm Natasha — a Data Scientist with a background in business, econometrics, and computer science.

I started in business and economics (Master’s), moved into statistical modelling and risk analysis (PhD), and later added computer science (Master’s) to strengthen my engineering foundations. My work now spans the full data and ML/AI stack: pipelines → modelling → evaluation → deployment → agentic systems.

I’m used to working with imperfect datasets where uncertainty, sampling, and data quality checks matter as much as the model itself.

I focus on:

  • Statistical modelling & inference (frequentist + Bayesian)
  • Machine learning & NLP (classical + LLM-based)
  • GenAI / agentic systems (LangChain, RAG, vector stores, APIs)
  • Data & software engineering fundamentals (Python, SQL, dbt, Snowflake)
  • Data visualisation: matplotlib, seaborn, plotly; model diagnostics & uncertainty; Tableau & Power BI (stakeholder dashboards)

My approach is simple: strong statistical reasoning, practical engineering, and clear communication.

💼 My Featured Projects:

A modular multi-agent framework using AutoGen, LlamaIndex, and LangChain to assess financial, compliance, and market risk. Each agent analyses structured and unstructured sources (news, filings, data reports), collaborating via LLM orchestration to surface early risk signals.

End-to-end ML pipeline predicting customer churn using XGBoost and SHAP for explainability, deployed via Streamlit with an LLM insights layer for natural language recommendations.

An AI system combining LLM fact-checking, NLP-based bias detection, and graph-based misinformation propagation analysis. Scrapes news and social media via APIs, delivering explainable insights in a Gradio UI powered by GPT-4o and LangChain.

Extracts and analyses web articles in real time, using Transformers for summarisation, bias scoring, and sentiment detection. Integrates FastAPI with Gradio for an interactive tool that empowers users to evaluate media quality and bias on the fly.

Buil an intelligent recipe assistant that helps users discover, generate, and save recipes. Built with Streamlit for a clean, interactive UI, it enables users to search a database of 2M+ recipes, generate AI-powered recipes using LLMs, and save favourites for easy access. The backend is powered by Python, SQL, and NLP, ensuring efficient querying and personalised recommendations.

Developed a robust client-server application using Python socket programming. Features include secure communication through serialisation, file encryption, and multi-threading.

Leveraged Hadoop, MapReduce, and sentiment analysis to process over 4 million tweets about NASDAQ-listed companies, extracting insights into public opinion and sentiment trends.

Applied various advanced machine learning techniques to model and predict PM10 air pollution levels in London (United Kingdom), supporting urban policy decisions for cleaner, healthier cities.

A backend engineering project that transforms raw SQL queries into secure, production-ready APIs. This system leverages Node.js, Express.js, and PostgreSQL, implementing JWT-based Role-Based Access Control (RBAC) to ensure granular permissions for different user roles—admin, manager, and general user. It enables efficient access to data with strict control over what users can read, write, or modify. Built with RESTful principles, the project demonstrates how to wrap SQL logic in a scalable, maintainable API layer, making it ideal for internal tools, reporting dashboards, or frontend apps that need secure backend data access.

Demonstrated the application of evolutionary principles, including mutation, crossover, and selection, to optimise a challenging combinatorial puzzle.

Built a deep learning pipeline to detect pneumonia from chest X-rays. The project compares a custom CNN with a pre-trained VGG16 model for binary classification.

Built and trained an innovative AI classifier to categorise recipes into difficulty levels by analysing ingredients and preparation steps using machine learning and NLP.

Designed a scalable data warehouse for a leading office supply company, featuring a Snowflake schema, ETL processes, and Oracle Database implementation for analytics.

Used advanced NLP techniques (BERT embeddings and LDA topic modelling) to cluster and semantically analyse recipes, uncovering trends in culinary narratives.

Explored the role of colour normalisation techniques in improving deep learning models for cancer detection from histopathology images.

Built custom CNNs for histopathology classification using the BreakHis dataset, incorporating advanced techniques like dropout, transfer learning, and batch normalisation.

Implemented a dense neural network (DNN) for classifying images in the Fashion MNIST dataset with the goal to explore DNN architecture design, activation functions, and regularisation techniques such as dropout, achieving accurate classification of clothing items.

Harnessed AI to model and predict PM10 pollution trends in Auckland (New Zealand) using advanced regression models and LSTM networks to aid environmental policy decisions.

Tackled class imbalance and model interpretability challenges in breast cancer diagnosis using cutting-edge architectures like ResNet, DenseNet, and EfficientNet.

📝 How I Think & Where I Write

I write about data, uncertainty, AI, human decision-making, and the psychology of systems.
New connections are always welcome — I also publish a newsletter on LinkedIn.

python logo javascript logo react logo html5 logo css3 logo

image

Popular repositories Loading

  1. Histopathology-AI-BreastCancer Histopathology-AI-BreastCancer Public

    Deep learning models for breast cancer diagnosis using histopathology images. Techniques include advanced CNN architectures, class balancing, and Grad-CAM interpretability.

    Jupyter Notebook 1 1

  2. drnsmith drnsmith Public

    Config files for my GitHub profile.

  3. ML-foundations ML-foundations Public

    Forked from jonkrohn/ML-foundations

    Machine Learning Foundations: Linear Algebra, Calculus, Statistics & Computer Science

    Jupyter Notebook

  4. FloydWarshallAlgorithm FloydWarshallAlgorithm Public

    Implementation of the Floyd Warshall algorithm

    Python

  5. Client-Server-Network-Socket-Programming Client-Server-Network-Socket-Programming Public

    This project builds a client-server network.

    Python

  6. SQLtoAPI-RBAC SQLtoAPI-RBAC Public

    The purpose of this project is to create SQL queries to support functional requirements, create a business logic layer to have API for each functional requirement using NodeJS, test and demonstrate…

    JavaScript

Morty Proxy This is a proxified and sanitized view of the page, visit original site.