MS
Marti Soura Vamseekar
ML Engineer · Data Scientist · Systems Architect
Available for PhD & Research Roles

Building intelligent systems

ML Engineer & Data Scientist specializing in transport analytics, hyperspectral imaging, and FinTech infrastructure. Turning complex data into production-ready intelligence.

PythonJavaTypeScriptMachine LearningSpring BootNext.jsAzure
Engineering Philosophy
01

Research-Driven Innovation

Literature review and SOTA analysis. Solutions grounded in peer-reviewed methodologies.

02

Reproducible Science

Version-controlled experiments with proper train/test splits and cross-validation.

03

End-to-End Ownership

Azure Data Factory orchestrates, Databricks transforms, Power BI delivers insights.

04

Cloud-Native by Default

Data Lake Gen2 for storage, Synapse for analytics, Databricks for compute.

Featured Work

Projects & Research

From ML research to production systems handling real revenue. Each project demonstrates end-to-end thinking.

UK Bus Analytics ML Platform

Production-grade geospatial analytics platform analyzing 779,262 bus stops across England with ML-powered policy insights.

ML

Focus: Transport analytics · Policy impact · ML for social good · Equity analysis

Research-grade platform integrating 779,262 stops with Census 2021, IMD 2019, NOMIS data (97-99% match rate). Three ML models: route clustering (198 types via HDBSCAN), anomaly detection (571 underserved areas identified), coverage prediction (R²=0.089 proving 91% is policy-driven). TAG 2024 & HM Treasury Green Book compliant. 22 novel capabilities vs existing £100k+ consulting reports. Delivers 57 policy insights across 8 analytical categories with BCR calculations for investment prioritization.

PythonSentence-TransformersHDBSCANPyODScikit-learn+5 more
2022 – Present

SAINTS: Uncertainty-Aware Deep Learning for HSI Classification

Bayesian neural networks that know when they might be wrong - first validated uncertainty quantification for hyperspectral imaging.

Research

Focus: Uncertainty quantification · Safe AI deployment · Robust classification · Reproducible science

SAINTS (Spatially-Aware Interpretable Neural Uncertainty System) achieves 94.73% accuracy with 0.45 uncertainty-error correlation on agricultural datasets. Novel contributions: (1) First validated UQ for HSI with 6.5× higher uncertainty for errors, (2) Wavelet-based spectral compression (100× parameter reduction), (3) Spatial leakage prevention methodology, (4) Multi-dataset validation (6 benchmarks: WHU-Hi, Indian Pines, Salinas, KSU, Pavia U/Center). Targeting Taylor & Francis publication. Enables safe deployment: 21.3% workload reduction via confidence filtering while improving accuracy to 98.06%.

PyTorchBayesian Deep LearningMC DropoutDWT CompressionLSTM+2 more
2023 – Present

Intelligent Wealth Management Platform (Contract, Live)

AI-driven investment optimization with algorithmic risk management - handling real money in production for UK-based client focused on Indian markets.

Systems

Focus: FinTech infrastructure · Quantitative finance · ML for trading · Risk management

Production backend for UK-based wealth management startup targeting Indian equity markets. 60,000+ data points (Nifty 50 × 5 years × 250 trading days) via Kite Connect API. Technical analysis engine: 50/100/200-day SMAs, Golden/Death Cross detection, volume analysis. Automated data pipeline with cron jobs, Indian market holiday calendar integration, rate-limited API compliance (2.8 req/sec). Phased capital deployment algorithm optimizing investments over 5-year horizons targeting 14-15% CAGR through ML-driven fund selection from Indian mutual funds and equities. Research contributions: Algorithmic portfolio rebalancing strategies, time-series forecasting models for market prediction, sentiment analysis integration, and anomaly detection frameworks for risk monitoring.

Java 17Spring Boot 3.2MongoDBKite Connect APITechnical Analysis+1 more
Jan 2023 – Present

MaSoVa Restaurant Management System

Production-grade microservices platform - Domino's-style operations at scale with real-time orchestration.

Systems

Focus: Microservices architecture · Real-time systems · Production engineering · Scalability

9 microservices with Spring Cloud Gateway (WebFlux): User, Menu, Order, Payment, Delivery, Analytics, Inventory, Customer, Store. 6-stage order state machine (RECEIVED→PREPARING→OVEN→BAKED→DISPATCHED→DELIVERED) with Spring State Machine. Real-time WebSocket (STOMP+SockJS) for Kitchen Display, Driver App, Customer tracking. React 18 + RTK Query with 16 API slices, neumorphic UI. JWT auth with role hierarchy (Customer/Staff/Driver/Manager). Multi-tenant store isolation. Production-ready: Docker Compose, health actuators, rate limiting (100 req/min), comprehensive logging. 13/17 phases complete (backend+frontend). Designed for real chain operations, not demo.

Java 21Spring Boot 3Spring Cloud GatewayMongoDBRedis+4 more
2024 – PresentView Project

AgrBIG – Precision Agriculture Big Data Architecture

MSc big data module project treated as a systems case study.

ML

Focus: Data architectures · Scalability · Agriculture

Designed ingestion from IoT sensors, satellite/drone imagery, social feeds, and market data into a dual-layer batch + streaming architecture. Focus on confidentiality, PETabyte-scale storage, and decision-support analytics for agricultural stakeholders.

Big DataBatch + StreamingIoTSatelliteSecurity
2021 – 2022
System Design

Architecture Deep Dive

Interactive architecture diagrams showing how I design production systems - from data ingestion through ML pipelines to user-facing applications.

System Architecture Diagram
Click tabs above to switch architectures
Innosolv F&O Trading Platform - Algorithmic Options TradingClick on any component to see detailsFRONTEND LAYER (Thymeleaf + HTML/CSS/JS)Strategy Scanner UIThymeleafPosition AnalysisDashboardOrder ManagementExecution UIOI AnalysisPut-Call RatioSPRING BOOT APPLICATIONSpring Boot 2.5.11 + Java 17REST Controllers • Scheduled Tasks • Dependency Injection • OkHttp ClientSERVICE LAYERStrategy ScannerMulti-threadedOrder ExecutionPosition-AwareRisk ManagementAutomated Square-OffPosition AnalyzerP&L CalculatorINTEGRATION & PERSISTENCE LAYERZerodha KiteConnect SDKMarket Data & OrdersBucket4j Rate Limiter10 calls/secMongoDBConfig & HistoryLEGENDFrontend UIBusiness LogicIntegration/Storage
Overview

Innosolv F&O Trading Platform

Algorithmic options trading system with intelligent strategy discovery and automated risk management for Indian derivatives markets.

Data Layer
  • Real-time strategy scanner: Evaluates Bull/Bear spreads and Iron Condors across all NSE strike prices with parallel margin calculation (10-thread processing).
  • Position analysis engine: Calculates max profit/loss, dual break-even points, net credits/debits for complex multi-leg positions.
  • Open Interest analysis: Put-Call ratio tracking, OI distribution visualization for support/resistance identification.
System Design
  • Spring Boot 2.5.11 + Java 17 with Zerodha KiteConnect SDK integration, Bucket4j rate limiting (10 req/sec), OkHttp for broker API.
  • Intelligent order execution: Position-aware routing (longs → shorts → futures), dynamic quantity allocation based on available margin, automatic hedge protection.
  • Automated risk management: Multi-trigger square-off system (profit targets, trailing stop-losses at 50%/75%/90% milestones, index boundary exits).
Real-World Impact
  • Research contributions: Multi-stage filtering pipeline (15× speed improvement), position-aware execution algorithms preventing naked exposures, multi-milestone trailing stop-loss optimization.
  • Technical innovations: Combined margin calculation leveraging broker offsets, predictive make-table notifications (2-min window), recursive order polling with exponential backoff.
  • Production metrics: 98%+ order success rate, zero naked shorts since launch, 18% profit improvement and 31% drawdown reduction vs fixed stops.
Code Samples

Code & Craft

A peek into my thinking across React components, API design, and data pipelines.

PySpark + Delta Lake
# Azure Databricks - ETL Transformation with PySpark
from pyspark.sql import SparkSession
from pyspark.sql.functions import col, when, sum, avg

# Read from Azure Data Lake Gen2
df = spark.read.format("delta").load(
    "abfss://raw@datalake.dfs.core.windows.net/sales/"
)

# Transform: Clean, aggregate, derive KPIs
sales_kpi = (df
    .filter(col("order_status") == "COMPLETED")
    .groupBy("region", "product_category")
    .agg(
        sum("revenue").alias("total_revenue"),
        avg("order_value").alias("avg_order_value"),
        count("*").alias("order_count")
    )
    .withColumn("performance_tier",
        when(col("total_revenue") > 100000, "High")
        .otherwise("Standard"))
)

# Write to Synapse Analytics for Power BI
sales_kpi.write.format("synapse").save("warehouse.sales_kpi")

Tech Stack

Languages

PythonSQLJavaTypeScriptC/C++

ML & Data Science

TensorFlowPyTorchScikit-learnPandasNumPyLangChainHDBSCANPyOD

Big Data & Cloud

Apache SparkPySparkDatabricksAzure Data FactorySynapse AnalyticsData Lake Gen2MongoDBRedis

Visualization

Power BITableauStreamlitPlotlyMatplotlib

Backend & Web

Spring BootNext.jsReactFastAPIREST APIsWebSocketGit

Azure Data Engineer Associate

Microsoft Certified (DP-203) • March 2025

Academic Background

Research-Driven Education

Approached my MSc as a research-intensive experience — each module was an opportunity for experimental design, reproducible results, and professional presentation.

Master's Degree

MSc Data Science

University of Greenwich, London, UK

2021 – 2022

Graduated with Merit

Treated the MSc program as a research-intensive experience rather than standard coursework. Each major module was approached as an opportunity to conduct mini-research studies requiring experimental design, reproducible results, and professional presentation formats.

Data Visualisation & Exploratory Analytics (COMP1800)

Retail Dataset Investigation

Systematic exploration of large, messy retail datasets — 8 distinct visualizations with narrative analysis, translating quantitative findings into actionable business insights.

Interactive Jupyter8 VisualizationsBusiness InsightsStakeholder Reports

Big Data Systems & Architectures (COMP1702)

AgrBIG — Precision Agriculture Platform

End-to-end architecture design for agricultural data management — IoT sensors, satellite imagery, drone surveillance, real-time streaming, and petabyte-scale storage.

IoT IntegrationMapReduceReal-time StreamingPetabyte ScaleArchitecture Design

Applied Machine Learning (COMP1804)

Facial Attribute Recognition System

TensorFlow-based multi-label classification system for 5 facial attributes (wrinkles, freckles, glasses, hair color/style) with manual annotation pipeline and comprehensive evaluation.

TensorFlowMulti-task LearningComputer VisionData AnnotationDeep Learning

Machine Learning (COMP1801)

Supervised & Unsupervised Learning Research

Comprehensive research in ML foundations covering regression, classification, clustering, optimization, and kernel methods with rigorous mathematical analysis and practical implementations.

Scikit-learnStatistical LearningOptimizationLaTeX DocumentationIEEE Format

Clouds, Grids & Virtualization (COMP1680)

HPC & Parallel Programming

Cloud platform analysis and parallel programming with OpenMP — performance benchmarking, scalability analysis, and distributed computing architectures for ML workloads.

OpenMPParallel ComputingCloud ArchitecturePerformance AnalysisHPC

Graph & Modern Databases (COMP1835)

NoSQL & Graph Database Systems

Designed and implemented NoSQL systems across 4 paradigms (document, key-value, column-family, graph) with Neo4j, MongoDB, Redis — polyglot persistence for big data.

Neo4jMongoDBRedisGraph TheoryPolyglot Persistence

Programming Fundamentals for Data Science (COMP1832)

Python & R Ecosystems

Mastered data science programming with NumPy, Pandas, Matplotlib, NetworkX in Python and R — portfolio-based assessment covering data structures, processing, and visualization.

PythonRNumPy/PandasData WranglingVisualization

MSc Dissertation Project

UK Transportation Network Analysis

National-scale GTFS analysis of UK public transportation (10 regions) — route classification by urban typology using NUTS framework, geospatial integration, and statistical profiling.

GTFS DataGeospatial AnalysisTransportation AnalyticsPythonStatistical Analysis
Bachelor's Degree

B.Tech Electronics & Communication Engineering

GITAM Institute of Science and Technology, Visakhapatnam, India

2016 – 2020

8.3 CGPA

Dissertation: Hyperspectral Image Analysis

Research in Remote Sensing & Image Processing

Applied signal processing and machine learning techniques to hyperspectral imagery — foundation for later work in computer vision and data-intensive research.

C/C++Data StructuresComputer NetworksDigital Signal ProcessingDigital Logic Design

Contact Information

LinkedIn
souramarti
Location
Hyderabad, India

References Available

  • Stef Garasto

    Senior Lecturer in Data Science at University of Greenwich. MSc thesis supervisor for UK bus analytics research and ML platform development.

  • Innosolv Private Limited, London, UK

    Industry reference for live trading backend systems and production engineering quality. Direct experience with high-frequency financial systems.

  • Dr. Ladi Sandeep Kumar

    Assistant Professor at Gandhi Institute of Technology & Management (GITAM) University, Visakhapatnam. B.Tech supervisor for hyperspectral imaging research and signal processing expertise.