ABOUT

Maroof Abdul Aziz

M.Sc. Robotic Systems Engineering @ RWTH Aachen
AI/LLM Developer | Internship & Thesis @ Audi | Ex-Mercedes-Benz

I’m a Master's student in Robotic Systems Engineering at RWTH Aachen, specializing in AI for language and vision. My work focuses on developing and optimizing large language models (LLMs), speech processing pipelines, and computer vision systems.

With industry experience at Audi and Mercedes-Benz, and a first-author IEEE publication, I bridge academic research and real-world AI deployment—building scalable, efficient, and impactful machine learning systems.

EXPERIENCE

Master Thesis – LLM Optimization

Audi AG

Oct 2024 – Present

Topic: Optimization of Small Language Models for Embedded Voice Assistance.
Research and develop language models tailored for specific use cases, utilizing fine-tuning, model compression and acceleration techniques.
Built synthetic datasets simulating realistic vehicle assistant queries and tool usage.

Internship – ChatGPT TTS Integration

Audi AG

Apr 2024 – Sep 2024

Project: ChatGPT integration into online speech processing of voice assistant.
Developed test datasets and evaluated multilingual language models using LLMs for text-matching and language processing tasks.
Conducted performance testing for scalable backends in automotive speech systems and automated reporting in CI/CD pipelines.
Contributed to Agile (Scrum) development processes and streamlined data analysis workflows.

Research Assistant – Medical Imaging

RWTH Aachen University

Aug 2023 – Mar 2024

Develop and test deep learning models for detection, classification and grading of tumor in high resolution images.
Literature and dataset research.
Published in IEEE ICIP 2024.

Senior Product Design Engineer

Mercedes-Benz R&D India

Dec 2015 – Jul 2022

Developed Python-based automation tools for design and validation processes.
Collaborated with cross-functional teams to prototype and design cost effective, manufacturable automotive exterior parts.

Master Thesis – LLM Optimization

Audi AG

Oct 2024 – Present

Topic: Optimization of Small Language Models for Embedded Voice Assistance.
Research and develop language models tailored for specific use cases, utilizing fine-tuning, model compression and acceleration techniques.
Built synthetic datasets simulating realistic vehicle assistant queries and tool usage.

Internship – ChatGPT TTS Integration

Audi AG

Apr 2024 – Sep 2024

Project: ChatGPT integration into online speech processing of voice assistant.
Developed test datasets and evaluated multilingual language models using LLMs for text-matching and language processing tasks.
Conducted performance testing for scalable backends in automotive speech systems and automated reporting in CI/CD pipelines.
Contributed to Agile (Scrum) development processes and streamlined data analysis workflows.

Research Assistant – Medical Imaging

RWTH Aachen University

Aug 2023 – Mar 2024

Develop and test deep learning models for detection, classification and grading of tumor in high resolution images.
Literature and dataset research.
Published in IEEE ICIP 2024.

Senior Product Design Engineer

Mercedes-Benz R&D India

Dec 2015 – Jul 2022

Developed Python-based automation tools for design and validation processes.
Collaborated with cross-functional teams to prototype and design cost effective, manufacturable automotive exterior parts.

PROJECTS

Driver Drowsiness Detection

May 2023 – Jul 2023

Built drowsiness detection model using CNNs and facial landmarks on 40K+ driver images.
Achieved 90% accuracy with custom CNN and 86% with ResNet50 transfer learning.
Used OpenCV for facial landmark detection to identify drowsy behavior.
Compared traditional CNN, OpenCV, and transfer learning approaches for performance.
Collaborated with TechLabs Aachen team during the Digital Shaper Program.

View on GitHub ↗

LangGraph Agent Deployment

Feb 2024 – Apr 2024

Full-stack RAG application using FastAPI, LangGraph, and Streamlit for tool-using LLM agents.
Document ingestion from websites, PDFs, and SQL into a Qdrant vector store using LlamaIndex.
OpenAI and Groq models with session memory, tool calls, and LLM-based reranking.
CI/CD pipeline with GitHub Actions to automate testing and deploy Docker containers to AWS EC2.
User-friendly chat interface for uploading data, switching models, and visualizing agent reasoning.

View on GitHub ↗

Master Thesis: Optimizing Small Language Models

Jan 2024 – May 2024

Optimized small language models for CPU-only embedded systems and in-vehicle voice assistants.
Designed and fine-tuned models using QLoRA with special tokens for tool-call accuracy.
Applied structured pruning and quantization (GPTQ, GGUF) for efficient model compression.
Built synthetic datasets simulating realistic vehicle assistant queries and tool usage.
Benchmarked models using real-world metrics: latency, memory, accuracy, and on-device inference speed.

Master Thesis: Optimizing Small Language Models

LangGraph Agent Deployment

Feb 2024 – Apr 2024

Full-stack RAG application using FastAPI, LangGraph, and Streamlit for tool-using LLM agents.
Document ingestion from websites, PDFs, and SQL into a Qdrant vector store using LlamaIndex.
OpenAI and Groq models with session memory, tool calls, and LLM-based reranking.
CI/CD pipeline with GitHub Actions to automate testing and deploy Docker containers to AWS EC2.
User-friendly chat interface for uploading data, switching models, and visualizing agent reasoning.

View on GitHub ↗

Master Thesis: Optimizing Small Language Models

Jan 2024 – May 2024

Optimized small language models for CPU-only embedded systems and in-vehicle voice assistants.
Designed and fine-tuned models using QLoRA with special tokens for tool-call accuracy.
Applied structured pruning and quantization (GPTQ, GGUF) for efficient model compression.
Built synthetic datasets simulating realistic vehicle assistant queries and tool usage.
Benchmarked models using real-world metrics: latency, memory, accuracy, and on-device inference speed.

LangChain RAG Agent Suite

Nov 2023 – Jan 2024

Modular RAG framework using LangChain + LlamaIndex.
RAG pipelines with multi-source ingestion: web, PDFs, and SQL databases.
Wikipedia, Arxiv, and Tavily tools for agent-based question answering.
Used FAISS for vector storage and LLM-based reranking for response relevance.
FastAPI backend and Streamlit frontend for interactive multi-modal user experience.

View on GitHub ↗

Cancer Detection on Whole Slide Images

Jul 2023 – Oct 2023

Cancer detection and subtyping using whole-slide histopathology images.
Attention-based heat maps to highlight tumor regions for interpretability and patch selection.
Integrated patch-level annotations and center loss to improve classification accuracy and feature separation.
Improved preprocessing with white-patch filtering and magnification normalization for consistent patch quality.
Enhanced RCC subtype classification, achieving better AUC, bACC, and F1 than baseline models.

View on GitHub ↗

Driver Drowsiness Detection

May 2023 – Jul 2023

Built drowsiness detection model using CNNs and facial landmarks on 40K+ driver images.
Achieved 90% accuracy with custom CNN and 86% with ResNet50 transfer learning.
Used OpenCV for facial landmark detection to identify drowsy behavior.
Compared traditional CNN, OpenCV, and transfer learning approaches for performance.
Collaborated with TechLabs Aachen team during the Digital Shaper Program.

View on GitHub ↗