Making our Data Scientists and ML engineers more efficient (Part 2)

In the last post, we briefly touched upon the concept of MLOps and one of its elements, namely the Feature Store.

Vineet Kumar

Co-Founder, CEO US

In the last post, we briefly touched upon the concept of MLOps and one of its elements, namely the Feature Store. We intend to cover a few more interconnected topics that are key to successful ML implementations and realizing sustained business impact.

At Affine, we ensure that a lion’s share of our focus in a ML project is on:

Investing more time on building Great Features and Feature Stores (than ML algos – yes, please don’t frown upon this ? ).
Setting up Robust and Reproducible Data and ML Pipelines to ensure faster and accurate Re-training and Serving.
Following ML and Production Standard Coding Practices.
Incorporating Model Monitoring Modules to ensure that the model stays healthy.
Reserving the last hour of each business day on Documentation.
And last but not the least, Regularly reciting the Magic words – Automate, Automate, Automate! ?

The following architecture is an attempt to simplify and encapsulate the above points:

This is the Utopian version of the ML architecture that every team aspires for. This approach attempts to address 3 aspects with respect to ML Training and Serving – Reproducibility, Continuous (or Inter-Connected-ness) and Collaboration.

Reproducibility of model artifacts/predictions – The ML components (Feature Engineering, Dataset Creation, ML tuning, Feature Contribution, etc.) should be build in such a way that it is simple to reproduce the output of each of those components seamlessly and accurately, at a later point in time. Now, from an application standpoint, this may be required for a root-case analysis (or model compliance) or re-trigger creation of ML pipeline on new dataset at a later point in time. However, equally importantly, if you can reproduce results accurately, it also guarantees that the ML pipeline (Data to ML training) is stable and robust! This also ultimately leads to reliable Model Serving.

Continuous Training aspect – This is known as Continuous Integration in the context of software engineering, and refers to automation of components like code merge, unit/regression testing, build, etc. to enable continuous delivery. A typical ML pipeline also comprises of several components (Feature Selection, Model Tuning, Feature Contribution, Model Validation, Model Serialization and finally Model Registry and Deployment). The Continuous aspect ensures that each module of our ML pipeline is fully automated and fully integrated (parameterized) with other modules, such that Data runs, Model Runs, and ML Deployment Runs happen seamlessly when the pipeline is re-triggered.

However, it all starts with “Collaboration” – Right Team and Right Mindset!

Before we delve deeper into each of these points, it is essential to touch upon one more topic – the need for a tightly-knit cross-functional team. It’s not pragmatic to expect the Data Scientists to handle all of the above aspects and the same is true for ML engineers. However, for a successful MLOps strategy, it’s important to get outside of our comfort zones, learn cross functional skills and collaborate closely. This means that the Data Scientists should learn to write production grade codes (modularization, testing, versioning, documentation, etc.) Similarly, the ML engineers should understand ML aspects like Feature Engineering and Model Selection to appreciate why these seemingly complex ML artifacts are critical in solving the business problem.

As for managing such a cross functional team of people that bring different niches, we need people who thrive in this knowledge-based ecosystem, where the stack keeps getting bigger every day. We need a ‘Jack of all Trades’, someone who knows a little bit of everything and possess the articulation skills to bring out the best from the team.

Getting the right cross functional team in place is the first (and unarguably the most important) piece of the puzzle. In the next article, we will go a bit deeper on each of the components described above. Please stay tuned…

About Author

Vineet is an industry veteran with nearly two decades of experience in the decision sciences sector. As a co-founder and chief solutions architect, Vineet and his team specialize in designing advanced analytics, AI, ML, and cloud solutions that enrich decision-making in Fortune 500 companies.

Vineet Kumar

Making our Data Scientists and ML engineers more efficient (Part 2)

Vineet Kumar

Co-Founder, CEO US

About Author

Vineet is an industry veteran with nearly two decades of experience in the decision sciences sector. As a co-founder and chief solutions architect, Vineet and his team specialize in designing advanced analytics, AI, ML, and cloud solutions that enrich decision-making in Fortune 500 companies.

Recommended Blogs & Articles

5 Pillars of AI Deployment in Startups

Although AI is becoming a critical factor in the long-term success of startups, a majority of them fail to deploy it. Most of them feel that employing...

Ankit Agarwal

A Lapse From Model-Centric to Data-Centric AI

Recently, AI has taken off the ground and has been bringing revolutionary changes in the industry. Its influence has been seen in many aspects of busi...

Affine

Accelerate Your eCommerce Sales with Big Data and AI for 2021

Holiday season is the most exciting time of the year for businesses. It has always driven some of the highest sales of the year. In 2019, online holid...

Heena Kohli

Accelerator or Incubator, Which One is Right for Your Startup?

Bringing ideas to life and transforming them into a business requires time, effort, and patience. It is crucial to have a support network in place tha...

Naganudeep V

AI in Robotic Process Automation – The Missing Link

Robotic Process Automation as we know it today is a framework through which large scale processes can be automated. The biggest advantage of current R...

Eron kar

Bayesian Theorem: Breaking it to Simple Using PyMC3 Modelling

Abstract This article edition of Bayesian Analysis with Python introduced some basic concepts applied to the Bayesian Inference along with some pra...

Dr. Monika Singh

Bidirectional Encoder Representations for Transformers (BERT) Simplified

In the past, Natural Language Processing (NLP) models struggled to differentiate words based on context due to the use of shallow embedding methods fo...

Shifu Jain

Bring your Art to Life with Pix2Pix

As an artist, I always wondered if I could bring my art to life. Although, it makes no sense, what if I told you that this was possible with Machine L...

Anamika Jha

Capsule Network: A step towards AI mimicking human learning systems

1. A quick introduction to Convolution Neural Networks The field of computer vision has witnessed a paradigm shift after the introduction of Convol...

Sourav Mazumdar

CatBoost – A new game of Machine Learning

Gradient Boosted Decision Trees and Random Forest are one of the best ML models for tabular heterogeneous datasets. CatBoost is an algorithm for gr...

Anamika Jha

Chatbot in Python-Part 1

According to Gartner, “by 2022, 70% of white-collar workers will interact with conversational platforms daily.” According to an estimate, more ...

Pratishtha Kapoor

Corporate storytelling – A Mythological Perspective

Stories are powerful, ideas are omnipotent. The world as we perceive now, is the cumulative product of innumerable ideas over a period of about 70,000...

Shuddhashil Mullick

Data Augmentation For Deep Learning Algorithms

Plentiful high-quality data is the key to great deep learning models. But good data doesn’t come easy, and that scarcity can impede the development ...

Affine

Deep Learning Demystified 2: Dive Deep into Convolutional Neural Networks

The above photo is not created by a specialized app or photoshop. It was generated by a Deep learning algorithm which uses convolutional networks to l...

Affine

Demystifying the struggles of adopting AI in the Manufacturing Sector

“For half of the businesses in the Manufacturing sector, AI adoption is still an unexplored area with a hand full of complex workflows and a mind fu...

Manas Agrawal

Detectron2 FPN + PointRend Model for Amazing Satellite Image Segmentation

Satellite image segmentation has been in practice for the past few years, and it has a wide range of real-world applications like monitoring deforesta...

AI Practices

Evolution Of Human Resource In The New World Of Technology How has the Human Resources changed with time?

Of all the departments and functions in a corporate organization, Human Resource is the one function related to employees’ personal aspects. The ent...

Urmita Das

Explainable AI

The advancement in AI technology has led us to solve several problems with technology working side by side. The complexity of these AI models is growi...

Affine

How Can Startups Implement AI in their Solution?

While building an AI strategy for startups may seem difficult, it has now become a necessity to gain a long-term competitive advantage. ...

Affine

How to build a legal document summarizer?

Have you ever thought how legal experts manage series of court statements effectively! Reading ~500 paged document and drawing out the general context...

Shifu Jain

Learn How to Classify Documents Using Computer Vision and NLP

Many companies, especially those in BFSI and Legal sectors, deal with a large volume of handwritten and scanned documents. It is difficult to easily u...

Affine

Making our Data Scientists and ML Engineers more Efficient (Part 1)

There is a lot of backend grunt work involved in deploying an ML model successfully before we even begin to realize its business benefits. More than 6...

Vineet Kumar

Copyright © 2024 Affine All Rights Reserved

Add Your Heading Text Here