Natalya (Natasha) Smith drnsmith

Hi 👋, I'm Natasha — a Data Scientist with a background in business, econometrics, and computer science.

I started in business and economics (Master’s), moved into statistical modelling and risk analysis (PhD), and later added computer science (Master’s) to strengthen my engineering foundations. My work now spans the full data and ML/AI stack: pipelines → modelling → evaluation → deployment → agentic systems.

I’m used to working with imperfect datasets where uncertainty, sampling, and data quality checks matter as much as the model itself.

I focus on:

Statistical modelling & inference (frequentist + Bayesian)
Machine learning & NLP (classical + LLM-based)
GenAI / agentic systems (LangChain, RAG, vector stores, APIs)
Data & software engineering fundamentals (Python, SQL, dbt, Snowflake)
Data visualisation: matplotlib, seaborn, plotly; model diagnostics & uncertainty; Tableau & Power BI (stakeholder dashboards)

My approach is simple: strong statistical reasoning, practical engineering, and clear communication.

💼 My Featured Projects:

Multi-Agent AI System for Business Risk Assessment

A modular multi-agent framework using AutoGen, LlamaIndex, and LangChain to assess financial, compliance, and market risk. Each agent analyses structured and unstructured sources (news, filings, data reports), collaborating via LLM orchestration to surface early risk signals.

Churn Prediction & Retention Analytics Platform

End-to-end ML pipeline predicting customer churn using XGBoost and SHAP for explainability, deployed via Streamlit with an LLM insights layer for natural language recommendations.

Fast-Track Fake News Detector: Real-Time Misinformation Analysis

An AI system combining LLM fact-checking, NLP-based bias detection, and graph-based misinformation propagation analysis. Scrapes news and social media via APIs, delivering explainable insights in a Gradio UI powered by GPT-4o and LangChain.

AI Web Analyser: Credibility & Bias Detection for Online Content

Extracts and analyses web articles in real time, using Transformers for summarisation, bias scoring, and sentiment detection. Integrates FastAPI with Gradio for an interactive tool that empowers users to evaluate media quality and bias on the fly.

AI Recipe Assistant: Where Data Science Meets Engineering

Buil an intelligent recipe assistant that helps users discover, generate, and save recipes. Built with Streamlit for a clean, interactive UI, it enables users to search a database of 2M+ recipes, generate AI-powered recipes using LLMs, and save favourites for easy access. The backend is powered by Python, SQL, and NLP, ensuring efficient querying and personalised recommendations.

Secure Communication Framework: Client-Server System with Python and Cryptography

Developed a robust client-server application using Python socket programming. Features include secure communication through serialisation, file encryption, and multi-threading.

Big Data Sentiment Analysis: Hadoop, Hive, and NLP for Sentiment Analysis of NASDAQ Companies

Leveraged Hadoop, MapReduce, and sentiment analysis to process over 4 million tweets about NASDAQ-listed companies, extracting insights into public opinion and sentiment trends.

Machine Learning Predictive Analytics for PM10 Pollution: Using Random Forests, Gradient Boosting, XGBoost, Ridge and Lasso Regressions, and Neural Network Regressor

Applied various advanced machine learning techniques to model and predict PM10 air pollution levels in London (United Kingdom), supporting urban policy decisions for cleaner, healthier cities.

SQL to API: Functional APIs with Role-Based Access Control

A backend engineering project that transforms raw SQL queries into secure, production-ready APIs. This system leverages Node.js, Express.js, and PostgreSQL, implementing JWT-based Role-Based Access Control (RBAC) to ensure granular permissions for different user roles—admin, manager, and general user. It enables efficient access to data with strict control over what users can read, write, or modify. Built with RESTful principles, the project demonstrates how to wrap SQL logic in a scalable, maintainable API layer, making it ideal for internal tools, reporting dashboards, or frontend apps that need secure backend data access.

Combinatorial Optimisation with Genetic Algorithms: Solving the 4x4 Puzzle

Demonstrated the application of evolutionary principles, including mutation, crossover, and selection, to optimise a challenging combinatorial puzzle.

AI-Driven Pneumonia Detection Using CNNs and VGG16

Built a deep learning pipeline to detect pneumonia from chest X-rays. The project compares a custom CNN with a pre-trained VGG16 model for binary classification.

AI-Powered Recipe Difficulty Classification with NLP and ML

Built and trained an innovative AI classifier to categorise recipes into difficulty levels by analysing ingredients and preparation steps using machine learning and NLP.

Scalable Data Warehouse Design with Snowflake and Oracle

Designed a scalable data warehouse for a leading office supply company, featuring a Snowflake schema, ETL processes, and Oracle Database implementation for analytics.

NLP-Driven Recipe Clustering: Topic Modelling with BERT and LDA

Used advanced NLP techniques (BERT embeddings and LDA topic modelling) to cluster and semantically analyse recipes, uncovering trends in culinary narratives.

Colour Normalisation in Deep Learning: Enhancing Histopathology Image Classification

Explored the role of colour normalisation techniques in improving deep learning models for cancer detection from histopathology images.

Custom CNNs for Histopathology Tumour Classification

Built custom CNNs for histopathology classification using the BreakHis dataset, incorporating advanced techniques like dropout, transfer learning, and batch normalisation.

Building, Training and Deploying Dense Neural Network for Classifying Images

Implemented a dense neural network (DNN) for classifying images in the Fashion MNIST dataset with the goal to explore DNN architecture design, activation functions, and regularisation techniques such as dropout, achieving accurate classification of clothing items.

AI-Powered Air Quality Prediction with Regression Models and Advanced Machine Learning Techniques (LSTM)

Harnessed AI to model and predict PM10 pollution trends in Auckland (New Zealand) using advanced regression models and LSTM networks to aid environmental policy decisions.

Histopathology AI: Breast Cancer Detection with ResNet, DenseNet, and EfficientNet

Tackled class imbalance and model interpretability challenges in breast cancer diagnosis using cutting-edge architectures like ResNet, DenseNet, and EfficientNet.

📝 How I Think & Where I Write

I write about data, uncertainty, AI, human decision-making, and the psychology of systems.
New connections are always welcome — I also publish a newsletter on LinkedIn.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Natalya (Natasha) Smith drnsmith

Block or report drnsmith

Hi 👋, I'm Natasha — a Data Scientist with a background in business, econometrics, and computer science.

💼 My Featured Projects:

📝 How I Think & Where I Write

Popular repositories Loading

Uh oh!

Search code, repositories, users, issues, pull requests...

Natalya (Natasha) Smith drnsmith

Hi 👋, I'm Natasha — a Data Scientist with a background in business, econometrics, and computer science.

💼 My Featured Projects:

📝 How I Think & Where I Write

Popular repositories Loading

Uh oh!