OpenVINO - Train and Deploy Neural Network (AI Model) in seconds onto IoT Edge device



Lets look at the challenges being faces by AI developers in training and deploying AI model and how OpenVINO and Azure IoT helps solve the problem

Challenges in training and deploying AI model
  • Choosing a Neural network model
  • Train/re-train till model converge (Costly and time consuming task)
  • Deploying it on Edge device (IoT device's / Laptop / Desktop )

Prerequisites/System requirements - Free Subscription, Software, Hardware and setup:
Solution:

   This Azure Marketplace (deploy) will make total end2end train and deploy onnx model in less than a minute. Application uses docker image built on OpenVINO with ONNX Run-time execution provider (EP)

Details: How it works:

Step 1: Training using customvision.ai in three simple steps 
  • Login to customvision.ai - Upload few train samples (min 25) 
  • Annotate them 
  • Do a quick train
      Ref: Getting started with customvision

Step 2: Deploying OpenVINO AI Vision Module  on to IoT Edge device
  • Click "Get It Now" on  Azure Marketplace  
  • Select device from IoT Hub to deploy 
  • Once deploy is successful - will see "OpenVINOReadyToDeployAIVisionModule" Edge module running
  • Expected output: Camera stream rendering on to display
Step 3: Passing ONNX model to the app with "Twin Updates"
  • 1.Copy "ONNX model URL from" -> customvision.ai -> select project -> Performance -> Export -> Click on ONNX -> Copy ONNX model URL
  • Goto "portal.azure.com" -> IoT Hub -> IoT Edge -> Select device
  • Click on "setmodules" -> click on -> "OpenVINOReadyToDeployAIVisionModule" 
  • Selecting "Twin Module Settings" -> pass "ONNX model URL path selected in step 3.1" to -> inference_files_zip_url (looks like inference_files_zip_url="onnx url path")
  • Finally click "Update" and "Review+Create"
  • Expected output: OpenVINO app will restart the stream and starts running inference based on the ONNX model passed (Object detection/Image classification) (Camera: Should be pointing to object/image of interest to do recognition/classifciation) Note: If no NCS2 connected, inference will start on Intel CPU  

Powered by


Note:

  • Setup is one time process - need some patients to go through cloud setup (if doing it first time) - Happy to answer any question - leave a comment    
  • Deploying (Docker pull) application will take decent time (only once per device) based on network speed
    • Note: Working on making light weight docker



Anaconda Installation on Mac with Python 3.7


Q&A on Deep Learning concepts

1.       What is Deep Learning?
Deep learning is a subset of machine learning that is concerned with neural networks.
           Deep learning represents a learning algorithm that learns representations of data through the                 use of neural nets.
2.       What is Mean?
       Mean: Average of all the numbers         
                                         

3.       What is Variance and Standard Deviation?
              Variance: The variance (σ2) is a measure of how far each value in the data set is from the                     mean. The Average of Squared difference from the mean
                                           

       Standard Deviation: Gives how spread out the numbers are
                                          

                                                  
4.       What is Perceptron?
A perceptron (type of neuron) takes serval binary inputs x1, x2,….x1,x2..and produces a               single binary output
                                 



                            
                              
5.       What is Sigmoid?
         Just like perceptron, sigmoid (type of neuron) has inputs x1,x2….But instead of just being 0          or 1 these inputs can also take values between 0 and 1. Also, just like a perceptron, the                  sigmoid neuron has weights for each input w1, w2…and an overall bias. But out is not 0 or            1, instead σ(w.x + b), sometimes call logistic function.
                                                 
                                                  



6.       What is Gradient?
            Gradient is another word for "slope". The higher the gradient of a graph at a point, the steeper             the line is at that point. A negative gradient means that the line slopes downwards. 
7.       Why “Gradient Descent”?
We learned that sigmoid takes Input X (ranging from 0 to 1), W weights and B bias values to     compute the output of a neuron. But, in order to calculate “W” and “B” we need a function to calculate them. So, Gradient Descent is one of the methods to calculate W and B values.


8.       Explain “Gradient Descent” and “Stochastic Gradient Descent (SGD)”?
           Both algorithms are methods for finding a set of parameters that minimize a cost/loss function            by evaluating parameters against data and then making adjustments.

In “Gradient Descent”: You will evaluate all the training samples for each set of parameters
In “SGD”: You will evaluate 1 training sample for the set of parameters before updating them.

Helps find which “Weights” and “Bias” number results in minimizing cost function

C(w,b) = 1/2n∑ ||y(x) – a||2

       w-> Weight, b->bias
 y(x) -> What the output (ref output) should be for input x
       a -> Output given by network for a given ‘x’, ‘w’ and ‘b’

9.       Explain “Training” a CNN model?
            Supervised -> labeling the data that model needs to be trained on
            Unsupervised Learning -> Training without labeling
           

10.   What is an epoch?
           Epoch -> is a single pass through entire data set.

11.   Explain “Back propagation”?
       An expression for the partial derivative of the Cost function (C) w.r.t to any weight (or bias).
      The expression tells us how quickly the cost changes when we change the weights and biases.

      The goal of backpropagation is to compute the partial derivatives ∂C/∂w and ∂C/∂b of the             costfunction C with respect to any weight w or bias bb in the network

      Four Fundamental Equations behind BP
      Backpropagation is about understanding how changing the weights and biases in a network           changes the cost function. Ultimately, this means computing the partial derivatives ∂C/∂wljk,         and ∂C/∂blj.

      We first introduce an intermediate quantity, δjl, which we call the error in the jth neuron in           the lth layer.

      Backpropagation will give us a procedure to compute the error δjl, and then will                             relate δlj to ∂C/∂wjkl and ∂C/∂bjl.
                                                           
                                            

                






          “BP” is about Understanding how changing weights and biases in a network changes the                cost function (ex: Output of Gradient Descent/SGD). Ultimately this mean computing the              partial derivatives of cost function w.r.t 

     The method calculates the gradient of a loss function with respect to all the weights in the              network

12.   Examples “Cost Functions”?
   “Gradient Descent”, “Stochastic Gradient Descent (SGD)”, L1 Regularization, and L2                    Regularization

13.   What are “Activation Functions”?
               Softmax: Its similar to Sigmoid   , 
               but with different function  
                                       




14.   What are hyper parameters and How to select “Hyper Parameters”?
              Learning rate, epoch(s), batch size (how many samples to process per iteration of training) 
              Regularization Parameter 

15.   How to initialize “weights” and “biases” for given network/model? why?
             Generally “Weights” and “Biases” are initialized randomly using Gaussian Distributions with               Mean=0 and Standard Deviation=1

              Why?? Ex: if you have 1000 input nodes and assume that you zeroed 500 samples and for                   reaming samples standard deviation is 501 (500 samples  + 1 bias) => 22.4…as shown                       below that is z  (w.x+b) has a very bad Gaussian Distribution, not sharply peaked

              


16.   When do you stop “training”?
       Neg case: If cost not going down (or) test accuracy is not improving (or) during overfitting            (Cost decrease but Accuracy doesn’t improve)
       Positive Case: If desired accuracy is reached

17.   What is Overfitting? How to avoid it?
     When Network shows decrease in cost, but accuracy does not change as expected w.r.t                 decrease in cost. In this scenario we say network is overfitting or overtraining.

     We need to check how Cost of train and test data varying as the network learns. If you see              opposite scenario (Cost of Training set going down and Cost of Test set going up) it’s a                  indication of Overfitting (Network not learning from dataset)

     Another sign is classification of training data set. If accuracy on training data set is 100% and        test data set is ~80%. It means that network memorizing training samples (not actually                  learning it features).

     The obvious way to detect overfitting is to use the approach above, keeping track of accuracy        on the test data as our network trains. If we see that the accuracy on the test data is no longer        improving, then we should stop training.* It might be that accuracy on the test data and the            training data both stop improving at the same time. Still, adopting this strategy will prevent            overfitting.
      
  •        Increasing the amount of training data is one way of reducing overfitting – Not always possible
  •     Using Regularization techniques, weight decay or L2 regularization. The idea of L2                       regularization is to add an extra term to the cost function, a term called the regularization term

                



18.   What is “dropout”?
      “Dropout” is another technique to remove “Overfitting”. It does not change Cost function as          done by L1 and L2, instead it changes the network itself.

      During training we forward-propagate input ‘x’ through the network and then back propagate       to compute the “gradient”. With dropout, this process is modified. We start by randomly                 removing hidden neurons in the network, while leaving the input and output neurons                     untouched and we repeat this process by randomly adding back previously removed hidden           neurons and removing other mini-batch of hidden neurons. We keep repeating this process             and calculating gradient and updating weights & biases in the network. 

19.   What is L1 and L2 norm?
              L1-norm is also known as least absolute deviations (LAD), least absolute errors (LAE). It is                 basically minimizing the sum of the absolute differences (S) between the target value (Yi)                   and the estimated values (f(xi)):
                                               
                                       

             L2-norm is also known as least squares. It is basically minimizing the sum of the square of                 the differences (S) between the target value (Yi) and the estimated values (f(xi):
                                                
20.   Difference between L1 and L2 Regularization?
          L1 regularization shrinks the weight much less than L2 regularization does

21.   Explain “Precision” and “Recall”? (Ex: Face Detection)
               TP -> Correctly detected faces
               FP -> detecting non-face as faces
               FN -> No.of.faces missed

              Precision -> Percentage of identified faces that are correct (TP)/ ((TP + FP))
              Recall -> Percentage of correctly detected faces to the total no of faces (TP)/ (TP + FN))


22.   Formula to compute size of Layers?
             Output size = [((Width – Kernel Size + 2*Padding)/Stride ) + 1]

23.   How to calculate no.of.operations per layer (FLOPS)?
        (Kernel W X Kernel H X Output No.of features X No.of channels of input image X Image W X H)

24.   How would you handle an imbalance data set?
Try collecting more data to even the imbalances
Resample the dataset to correct the imbalance
Try different algorithm altogether on your dataset

25.   When should you use “classification” over “regression”?

           Classification produces discrete values and dataset to strict categories, while regression gives              you continuous results that allow you to better distinguish differences between individual                    points. You would use classification over regression if you wanted your results to reflect the                belongingness of data points in your dataset to certain explicit categories (ex: If you wanted to            know whether a name was male or female rather than just how correlated they were with male            and female names.)


Please leave a comment, if have any questions on above Q&A (or) Want to ask any new Question on this topic of Deep Learning.

Happy Reading.

Related Posts

Twitter Updates

Random Posts

share this post
Bookmark and Share
| More
Share/Save/Bookmark Share