avatar

Mohit Rajput

Machine Learning Software Engineer | Data Scientist | AI/ML/DL | IIT Roorkee


Areas of Professional Interest
  • Reinforcement, Unsupervised Learning and Recommendation System
  • Data Collection, Data Engineering and Experiment Design
  • EDGE Devices based ML
  • Data based Product Research and Development
  • Deep Learning based solutions (RNN, CNN, LSTM, SOM, Auto Encoder)
  • Natural Language Processing (NLP) and ChatBots
  • Time Series analysis & Computer Vision

Experiences
CandorHealth [Full-time]
Bangalore
Senior Machine Learning Software Engineer
Mar'23 - Present

Experience in this company:

  • National Provider Identifier (NPI) Filler services for the missing NPI
    • Filled in the missing information for multiple cases of Healthcare provider by utilizing the data available from data source.
    • Tags: Identity Resolution, Similarity Mapping, Search, Parallel Processing, Big Data, APIs, Redis, Spark  ‎  
       ‎  
Amazon Seller Services Private Limited - Analytics Technology and Engineering (ATE) [Full-time]
Bangalore
Data Scientist II
Apr'22 - Jan'23
Engineer
Feb'23 - Mar'23

Experience in this company:

  • Day Ahead Capacity Planning
    • Providing Day Ahead forecast for the Lastmile, to the Delivery Stations. This enables them to plan for the logistics.
    • Cost Saving comes from minimizing the Over and the underbooking of the logistics.
    • Tags: Time Series, Curve Matching, ML, Sagemaker, EC2, Lambda, Event Bridge, S3, Dashboard
  • Common Utilities Python Lib (v1)
    • A simple library that contains the common utilities to be used as part of multiple projects.geography
    • Credential Keeping and Retrieving: This is based on AWS Secret Manager
    • SQL Query Executor: Run SQL jobs while capturing and providing the job stats and logging them too.
    • Logger: A file and table based logger wrapper
    • S3 Interaction: A wrapper containing functions for S3 based utility.
    • Operations: A set of certain class & functions for Path, Time, & Math related Operations
    • Data Evaluation & Clustering: Functions to evaluate & show performance of Regression & class based data in much greator details.
    • Data Analysis: Function to identify Missing data wrt to groups, plotting the data etc were present.
    • Document Interaction: ability to connect & retrieve data from Quip
    • Tags: AWS Services, DB, SQL, Evaluation, Logging, Analyis, data pipeline
  • Capacity Planning Performance Evaluation, Analysis & Dashboard
    • Developed and maintain capacity planning evaluation and analysis. It is robust enough to get multiple insights and even generate plans for current state, pilots, and areas for improvement.
    • Helps better evaluate multiple component of delivery at multiple country, station, ship methods and cycles
    • Tags: Time Series, EC2, QuickSight, S3, Excel
  • Multiple Reports Creation and Data Pipelines Testing
    • Created Reports for Middle Mile Performance Evaluation and tested/adjusted few data pipelines for reporting.  ‎  
       ‎  
Cargill - Cargill Digital Labs [Full-time]
Bangalore
Machine Learning Software Engineer
Jun'19 - Apr'22

Experience in this company:

  • ML labs v2 setup
    • Defining ways to Work with ML projects and other applications having ML related features
    • Brining sustainable practices for accelerated development while maintaining the agility
    • Planning phase of data governance, 3rd Party Data Annotators, Portal development for data tagging, selection and versioning, data collection experiment designs and pipelines.
  • Estimating Growth Indicator of Chicken (POC)
    • Estimating the weight and age of the chicken based on video recording done using the mobile camera.
    • Tags: Agile Experiment Setup & Design , Object Detection, Image Standardizer, Estimator, Ensembling
  • Defective Nugget Detection and Accounting (POC then MVP)
    • Defective Nugget Detection and Tracking on conveyor belt with output projected on display for manual removal.
    • Tags: Jetson Nano, Raspberry Pi, Ardino Uno, Edge device, Object Detection, Tracking, on-premise, real-time
  • HR Chatbot (Myco) (MVP)
    • Initial product version was launched for Singapore and Australia with capability for form filling, ticket creation, ticket checking, easy data integration from excel sheet
    • Tags: Rasa, Redis, NLU, NDM, NLG, NLP, Chatbot, excel
  • Financial Commentary Generation (POC)
    • A utility for table-to-text based on template method was developed for the initial stage, b/c of the lack and too much fragmentation of data source this approach was deemed fit.
  • YOLOv5-v4 training and deployment integration on AWS suitable to multiple tailored needs.
    • Tags: Computer Vision, Sagemaker, ECR, S3, Lambda Function, API Gateway, IAM, CloudWatch
  • Computer Vision Utility
    • Streamlit based web application for parsing, labelling, visualizing annotation, with support for Yolo prediction.
    • Ability to connect work with local, S3 and pcloud storage.
  • MiApp feature involving weather data visualization using the GIS data.
  • Information Extraction from Ship Invoices (POC)
    • Extracting multiple information through the information that was shared in old and current emails.
    • Tags: Text-to-Table, Analytics; Regex, Python Email to DB  ‎  
       ‎  
ShieldSquare (Radware) - Reseach and Development Team [Full-time]
Bangalore
Data Scientist – R&D
Apr'18-May'19
Associate Data Scientist – R&D
Jun'17-Mar'18

Experience in this company:

  • ICLSSTA
    • Solely developed a framework, which develop and evaluate unsupervised learning solution in much depth and this from a single config file.
    • Tags: Framework, Clustering, Anomaly, Outlier, Conceptual Drift Detection, Ensembling, Support all Dimension reduction, Anomaly, and Clustering Algorithm from Sklearn, Add. Custom Algorithm
  • Adaptive Action Taking (AAT)
    • Developed a generalized module named which makes use of reinforcement learning to automatically take action on incoming traffic on the web property.
    • Tags: Reinforcement Learning, Multi-Armed Bandit, Smart handling of Business Limits, Self-Adjusting
  • Deep Behaviour Analysis (IDBA)
    • Partially developed a module named Intent based, which utilizes LSTM to yield encoded features and scores that are used with supervised, anomaly and clustering.
    • Tags: RNN, Auto-Encoder, Semi-Supervised Learning, Sequence Based Detection, Deep Learning
  • Threshold Online Reinforcement (ThOR)
    • Helped developed a module named, which makes utilizes online learning to develop a probability score distribution over of isotonic regression for action taking.
    • Tags: Isotonic Regression, Online Learning, Self-Adjusting, Smart handling of Business Limits
  • Second Level Module Integration
    • Developed and implemented end to end machine and reinforce learning based solutions using ICLSSTA-AAT, IDBA-ICLSSTA-AAT, & IDBA-ThOR
    • Tags: Semi-Supervised, Supervised, unsupervised, reinforcement learning, behaviour analysis
  • Titan Batch Analyser
    • Developed and Implemented, which utilizes multiple browser signatures, behaviour based rules, regular expressions and databases to generate a suspiciousness score to take action on traffic.
    • Tags: SQL & Python Implementation, ETL, Advance SQL operations, Handling Multiple Datasets
  • Rule Scripts
    • Developed and Implemented multiple rules which were based on network, device and browser fingerprint, and behaviour of the visitors. These were adequately balanced between causal and correlated, yielding negligible FP.
    • Tags: Domain knowledge, Behaviour Analysis, Rules development
  • Rule Mining
    • Created a Recommendation System for the highlighting possible bad signature coming in the traffic.
    • Tags: Associate Rule Mining, Text Data Pre-processing
  • Developing Dynamic Moving Collective Intelligence which can evolve while working across multiple sid.
  • Other Works
    • Developed Dashboard, Visualization and Analysis, spreadsheet to understand the whole behaviour of the traffic. Partially interactive visualization sheets were also developed in python.
    • Tags: SQL and Spreadsheet/Python Implementation, Visualization, Result Sharing  ‎  

Projects

Here’re some projects I’ve developed

  IP camera (RTSP protocol) to consistent HTML Live Stream with offline Data Sync

  Edge Device based CV application for counting of sheets processed in a workshop [demo link]

  Sticker Detection and OCR [demo link]

  Image to Image Search - Flicker8k [demo link]

    Following Searchers were created
     1. RCH - Regional Color Histogram
     2. PTG - Pre Trained Grouping
     3. TAE - Trained Auto Encoder
     4. ICG - Image Caption Based Grouping (incomplete)

  Image to Image Search - Jewellery [demo link]

  Accounting for Pipe Bundles being Exported [demo link]

  Face Detection, Face Landmark Detection and Face Recognition Wrappers [demo link]

  Human Pose Estimation and Activity Recognition [demo link]

  GIS Temperature and Precipitation Time based changes visualization [demo link]

  WhatsApp based Chatbot (Hack around way)

  Selenium based web scrapping and Web Application Control of Tinder & WhatsApp

  Streamlit deployed app for basic Computer Vision applications. [demo link]

  Streamlit based threshold adjustment for object detector and annotation visualizer

  Supervised and Unsupervised Learning Framework

    Easy selection of basic data pre-processing, transformation, dimension reduction/clustering/anomaly algorithm selection or pipeline creation, hyperparameters setting, ensemble, and EDA.

  Other projects

  •     Flying Taxi Business Case, Udacity
  •     Build a Scalable Data Strategy, Udacity
  •     Create an Iterative Design Path, Udacity

  •     Facial Keypoint Detection, Udacity
  •     Image Captioning, Udacity
  •     Landmark Detection & Tracking (SLAM), Udacity

  •     Part of Speech Tagging, Udacity
  •     Build an Adversarial Game Playing Agent - Isolation game playing agent, Udacity
  •     Build a Forward-Planning Agent - Cargo route planning, Udacity
  •     Sudoku solver, Udacity

  •     Twitter Sentiment Analysis
  •     Identifying customer segment
  •     Black Friday: Understanding the customer purchase behaviour, Analytics Vidhya
  •     Estimation of the audience score of movies, Coursera
  •     Predicting the way in which exercise were done, Coursera
  •     Forecasting bike rental demand in the Capital Bike Sharing Program, Kaggle
  •     Predicting Survival on the Titanic, Kaggle

Python Libraries

Here’re the libraries that I’ve created

 1. Algorithmic Machine Learning Exploration and Exploitation Tool (AMLEET)

 ‎ ‎A personal library compromised of general code base for faster develoment. It supports the following.

  • General: Code Visual in Terminal; operations reated to list, dict, datetime, pandas and numpy; Runtime python cmdline support; Git status info; system performance; and many more.
  • Logger: Initializing, Method.
  • Configuration: Generator; Accepts json, yaml, cmdline, configparser and dict; Merge multiple configs; etc
  • Notifications: Support sending Emails and SMS
  • Storage: Ability to work with datalakes such as Pcloud, S3.
  • DB Support: Provides functionalities to work with tables and DB. Additionally support GoogleBQ.
  • Computer Vision:
    • General
    • Work With Streams
    • Color Threshold
    • Draw on Frame
    • Manipulate Frame
    • Image Transformation for Transfer
    • Video to Images
    • Images to Videos or GIF
    • Annotation Conversions and Plotting
    • Image Pixel Standardizations
    • Multi Face Detection
    • Multi Face Landmark
    • Multi Recognition - best selection
    • Multi Pose Estimation
    • Training Data Creation
  • NLP:
    • Text Cleaning
    • EDA
    • Embedding Creation
  • Tables
    • Feature Analysis
    • Custom Scaling and Transformation
    • EDA
    • Transformation
  • Supervised & Unsupervised Learning
    • Wrapper on scikit-learn
    • Anomaly and outlier Detection
    • Custom Algorithm support
    • Framework to support config to select suitable algorithm and params
    • Ensembling
    • Support for multiple metadata export
  • Evaluation
    • Regression
    • Classification
    • Object Detection
  • Sections Still in Development
    • Redis Integration
    • GIS Data Support
    • Time Series
    • Network information extraction  ‎  
       ‎  

 2. COCO Transformation Utility (CTU) [demo link]

 ‎ ‎Developed as a part of ML-Labs@Cargill. Awaiting for approval to be released as open source.

  • Enable modifying your COCO annotations similair to the transformation applied to the images.
  • Provides the capabilty to have augmented images and annoations.
  • Ability to plot multiple annoations on images.  ‎  

Patent & Research Paper
  • First Author. “System and method for detecting bots based on iterative clustering and feedback-driven adaptive learning techniques”; us US20200099713A1; link.
  • Second Author. “System and method for detecting bots using semi-supervised deep learning techniques”; us US20200099714A1; link.
  • First Author. “Electric field and current assisted alignment of CNT inside polymer matrix and its effect on electrical and mechanical properties”, in International Journal for the Science and Technology of Polymers 2016; link.

Tools & Skills
  • ML Languages/Framework
    • Python, Scikit-Learn, Tensorflow, PyTorch
  • Other Language
    • R, SAS, JAVA, C++
  • Other
    • APIs, Google Big Query, Docker, Selenium, Redis, Grafana, Kibana, HTML, CSS
  • Databases
    • SQL, No-SQL
  • Software
    • Tableau, Excel, XLMiner, Spyder, Jupyter Notebook, RStudio, Gimp, SolidWorks
  • Cloud Computing Services
    • Google Cloud Platform, Amazon Web Services, Microservices Architecture
  • Cloud Data lakes
    • S3, Cloud Storage, Drive, Pcloud
  • Hardware
    • Jetson Nano, Raspberry Pi, Arduino Uno, IP Cameras, Sensors, basic electrical devices and circuits


Education
Indian Institute of Technology, Roorkee
2012 - 2017
Metallurgy and Material Engineering
B.Tech. + M.Tech. (Dual Degree Course)
  • Credit Courses:
    • Economics
    • Corporate Social Responsibility
    • Object Oriented Programming
    • Management Concepts and Practices
    • Marketing
    • Human Resource Management
    • Operations
    • Financial Management
    • Behaviour Psychology
Udacity - Data Product Manager
2021
Nano Degrees

https://confirm.udacity.com/QGUGJTSK

Udacity - Computer Vision
2020
Nano Degrees

https://confirm.udacity.com/4DMWY73G

Udacity - Artificial Intelligence
2018
Nano Degrees

https://confirm.udacity.com/KGCWC7NW



Certificates

Languages
  • Hindi [Native]
  • English [Professional]

Interests
  • Reading
  • Cooking
  • PC Gaming
  • Bike Trips
  • Travelling