Q1 - What is a computational graph in Deep Learning?

A computational graph is a series of operations performed to take inputs and arrange them as nodes in a given graph. It is a way of implementing mathematical calculations into a graph. This way, it helps in parallel processing and provides high performance in terms of computational capability.

Q2 - What do you understand about text normalization in NLP?

When developing NLP models, we may encounter data in various forms which may internally have similar contexts. Textual normalization captures different kinds of variations into one representation.

Q3 - You are given a 5x5 image with a 3x3 filter and a padding p = 1. What will be the resultant image's size if a convolutional stride of s = 2 is used?

For an nxn image with an fxf filter, padding p, and stride length s, the resultant image's size after convolution has the shape n + 2p - fs + 1 x n + 2p - fs + 1. Hence, the resulting size of the image will be (((5 + 2 * 1 - 3) / 2) + 1) x (((5 + 2 * 1 - 3) / 2) + 1)= 3 x 3.

Q4 - What is a Convolutional Neural Network?

A Convolutional Neural Network(CNN) is a type of neural network that is designed to work with images. It is made up of a series of layers, each of which is responsible for detecting certain features in a given image. For example, the first layer might detect edges,the second layer might detect shapes, and so on.

Q5 - What are some popular CNN applications in Industry?

CNNs are used extensively in computer vision(CV) applications such as image classification, object detection, and face recognition.

Q6 - What is the significance of pooling layers in a CNN?

Pooling layers are used in CNNs in order to reduce the dimensionality of the data and to extract the most important features from the data. Pooling layers can be either max pooling or average pooling.

Q7 - What are Recurrent Neural Networks?

Recurrent neural networks(RNN) are a type of artificial neural network that are designed to model sequential data. This makes them well-suited for tasks such as natural language processing or time series prediction. RNNs are able to remember information from previous inputs, which allows them to better understand the context and improve predictions.

Q8 - What makes RNN different from other neural networks?

The key difference between a recurrent neural network and a traditional neural network is that a recurrent neural network can maintain an internal state that allows it to remember information about the previous inputs it has seen. This internal state is what allows a recurrent neural network to model temporal dependencies.

Q9 - What is the difference between LSTM, GRU, and RNNs?

LSTM and GRU are both types of recurrent neural networks (RNN), which are a type of neural networks designed to handle sequential data. Normal RNNs are the simplest type of RNN. LSTMs are a more complex type of RNN that can better handle long-term dependencies, while GRUs are a slightly simpler type of RNN that can learn faster.

Q10 - What is gradient descent?

Gradient descent is an optimization algorithm that’s used when training a machine learning model and is based on a convex function and tweaks its parameters iteratively to minimize a given function to its minima.

Q11 - What is the idea behind the Gradient Descent?

The main idea behind the gradient descent is to take steps in the negative direction of the gradient. This will lead to the steepest descent and eventually, it will lead to the minimum point

Q12 - Can you briefly explain about BERT model?

BERT, which stands for Bidirectional Encoder Representations from Transformers, is a language representation model that aims at tackling various NLP tasks, such as question answering, language inference, and text summarization.

Q13 - How is BERT different from context-free models?

The BERT model is pre-trained on two relatively generic tasks, Masked Language Modelling (MLM), and Next Sentence Prediction. The masked language model randomly masks some of the tokens from the input, and the objective is to accurately predict what these masked tokens are, based only on their context. The Next Sentence Prediction task trains the model to predict whether or not a sentence comes after another sentence by studying the dependencies between sentences.

Q14 - What part of BERT's architecture gives it Bidirectionality?

Transformer Encoder, which is a Bidirectional Self-Attentive model, gives BERT its Bidirectionality. This is because every token in a given sentence is attended to after considering its context with respect to every other token in that sentence

Q15 - What is backpropagation in neural networks?

The goal of backpropagation in a neural network is to update the weights for the neurons in order to minimize the loss function. Backpropagation takes the error from the previous forward propagation and feeds this error backward through the layers to update the weights.

Q16 - What are batch size and epoch in a neural network model?

Batch Size is the number of training samples in each forward propagation and backpropagation before the model weights are updated.
Epoch is the number of complete passes through the whole training dataset.

Q17 - What are exploding and vanishing gradients?

Vanishing and exploding gradients can happen for deep multi-layer neural networks.

Vanishing gradients refer to the scenario that the gradients get smaller and smaller when the model back propagates and the weights of the model cannot be updated further.
Exploding gradients refer to the scenario that the gradients get larger and larger when the model back propagates and the weights of the model cannot be updated properly.

Q18 - What is PyTorch?

PyTorch is a part of computer software based on torch library, which is an open-source Machine learning library for Python. It is a deep learning framework that was developed by the Facebook artificial intelligence research group. It is used for the application such as Natural Language Processing and Computer Vision.

Q19 - What are the essential elements of PyTorch?

PyTorch tensors
PyTorch NumPy
Mathematical operations
Autograd Module
Optim Module
nn Module

Q20 - What are Tensors?

Tensors play an important role in PyTorch. this framework is completely based on tensors. A tensor is treated as a generalized matrix. It could be a 1D tensor (vector), 2D tensor (matrix), 3D tensor (cube), or 4D tensor (cube vector).

Q21 - What is Early Stopping in Deep Learning?

Early stopping in deep learning is a type of regularization where the training is stopped after a few iterations. When training a large network, there will be a point during training when the model will stop generalizing and start learning the noise in the training data. This does not help the model in any way. Defining early stopping in a neural network will prevent the network from overfitting.

Q22 - What are some criticisms of Neural Networks?

too much data needed to train
the model is not interpretable.
not understandable clearly and easily
needs more resources to train

Q23 - Explain the following variant of Gradient Descent: Stochastic, Batch, and Mini-batch.

Stochastic Gradient Descent - is used to calculate the gradient and update the parameters by using only a single training example.

Batch gradient descent - is used to calculate the gradients for the whole dataset and perform just one update at each iteration.

Mini-batch gradient descent - is a variation of stochastic gradient descent. Instead of a single training example, a mini-batch of samples is used. Mini-batch gradient descent is one of the most popular optimization algorithms.

Q24 - Name a few hyperparameters used in any neural network.

Some of the hyperparameters used in neural networks include the number of neurons, activation function, optimizer, learning rate, batch size, epochs, regularization parameters, dropout rate, and weight initialization methods

You may also be interested in

TOP Deep Learning Interview Questions and Answers

Q1 - What is a computational graph in Deep Learning?