0% found this document useful (0 votes)
3 views

Deep Learning

The document provides an overview of deep learning, including its definition, applications, and differences from machine learning. It covers key concepts such as kernel methods, multi-layer perceptrons, activation functions, convolutional neural networks, and natural language processing. Additionally, it discusses word vectors and their significance in NLP, along with local and distributed representations.

Uploaded by

sk5881998
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Deep Learning

The document provides an overview of deep learning, including its definition, applications, and differences from machine learning. It covers key concepts such as kernel methods, multi-layer perceptrons, activation functions, convolutional neural networks, and natural language processing. Additionally, it discusses word vectors and their significance in NLP, along with local and distributed representations.

Uploaded by

sk5881998
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 13

MID ANSWERS

1.The definition of Deep learning is that it is the branch of machine


learning that is based on artificial neural network architecture. An artificial
neural network or ANN uses layers of interconnected nodes called neurons that
work together to process and learn from the input data.
Applications:
Here is a list of applications of deep learning:
1.Image classification, 2) Object detection, 3) Facial recognition, 4) Medical
imaging analysis, 5) Autonomous vehicles, 6) Natural language processing
(NLP), 7) Text translation, 8) Sentiment analysis, 9) Chatbots, 10) Speech
recognition, 11) Speech-to-text conversion, 12) Real-time language translation,
13) Drug discovery, 14) Genomic analysis, 15) Disease detection.
2.difference between ml and dL

Machine Learning Deep Learning

Uses artificial neural network


Apply statistical algorithms to learn
architecture to learn the hidden
the hidden patterns and relationships
patterns and relationships in the
in the dataset.
dataset.

Requires the larger volume of


Can work on the smaller amount of
dataset compared to machine
dataset
learning

Better for complex task like image


Better for the low-label task. processing, natural language
processing, etc.

Takes less time to train the model. Takes more time to train the model.

A model is created by relevant Relevant features are automatically


Machine Learning Deep Learning

features which are manually


extracted from images. It is an end-
extracted from images to detect an
to-end learning process.
object in the image.

More complex, it works like the


Less complex and easy to interpret
black box interpretations of the
the result.
result are not easy.

It can work on the CPU or requires


It requires a high-performance
less computing power as compared
computer with GPU.
to deep learning.

3.what is kernal method?


Kernel methods are a class of algorithms for pattern analysis, which aim to find
patterns in data through transformations. The core idea of kernel methods is
the kernel trick, which allows them to operate in a high-dimensional feature
space without explicitly computing the coordinates of the data in that space.
Instead, they rely on kernel functions that compute the inner products
between the images of all pairs of data points in the feature space. Kernel
methods are widely used in machine learning, especially in support vector
machines (SVMs), for tasks like classification, regression, and clustering.
Four Types of Kernel Methods:
1. Linear Kernel
o Formula: K(x,y)=x⋅yK(x, y) = x \cdot yK(x,y)=x⋅y
o Description: This is the simplest kernel, where the inner product
between two vectors (data points) is computed. It is used when
data is linearly separable, meaning it can be separated by a
straight line (in 2D) or a hyperplane (in higher dimensions). The
linear kernel is effective when the data has a clear linear structure.
o Use Case: Text classification, where the dimensionality of the data
is high, but the relationships between the features are simple.
2. Polynomial Kernel
o Formula: K(x,y)=(x⋅y+c)dK(x, y) = (x \cdot y + c)^dK(x,y)=(x⋅y+c)d
o Description: The polynomial kernel maps the input vectors into a
higher-dimensional space using polynomial functions. The degree
ddd controls the flexibility of the decision boundary. A higher
degree means the algorithm can handle more complex data
distributions.
o Use Case: Image processing tasks or scenarios where the
relationships between features are nonlinear.
3. Radial Basis Function (RBF) or Gaussian Kernel
o Formula: K(x,y)=exp⁡(−∥x−y∥22σ2)K(x, y) = \exp\left(-\frac{\|x - y\|
^2}{2\sigma^2}\right)K(x,y)=exp(−2σ2∥x−y∥2)
o Description: The RBF kernel is one of the most widely used kernels
in machine learning. It transforms the data into an infinite-
dimensional space, where the separation of classes becomes
easier. It measures similarity between data points based on their
distance, where σ\sigmaσ controls the spread of the kernel.
o Use Case: Support vector machines (SVMs) for complex, non-
linear data, such as image classification or handwriting
recognition.
4. Sigmoid Kernel
o Formula: K(x,y)=tanh⁡(αx⋅y+c)K(x, y) = \tanh(\alpha x \cdot y +
c)K(x,y)=tanh(αx⋅y+c)
o Description: The sigmoid kernel resembles the activation function
of a neural network, making it similar to models like multi-layer
perceptrons. The kernel performs a transformation based on the
hyperbolic tangent function.
o Use Case: Can be used in neural network-based approaches or
scenarios where SVMs need to act similarly to neural networks.
4.explain about multilayer perception?

The above diagram is the building block of the whole of deep learning.
Perceptrons bear similarity to neurons as the structure is very similar.
Perceptron also takes input and give output in the same fashion as a neuron
does. Hence the name neural network is generally used to name the models in
deep learning.
Multi-Layered Perceptron(MLP):
As the name suggests that in MLP we have multiple layers of perceptrons.
MLPs are feed-forward artificial neural networks. In MLP we have at least 3
layers. The first layer is called the input layer, the next ones are called hidden
layers and last on is called the output layer. The nodes in the input layer don’t
have activation, in fact, the nodes in the input layers represent the data point.
If the data point is represented using a d-dimensional vector then the input
layer will have d nodes. The below diagram will make the point more clear.
5.active function and types
What is an activation function?
Simply put, an activation function is a function that is added into an artificial
neural network in order to help the network learn complex patterns in the
data. When comparing with a neuron-based model that is in our brains, the
activation function is at the end deciding what is to be fired to the next
neuron. That is exactly what an activation function does in an ANN as well. It
takes in the output signal from the previous cell and converts it into some
form that can be taken as input to the next cell. The comparison can be
summarized in the figure below.

Types of active funtions:


1. ReLU (Rectified Linear Unit):
o f(x)=max⁡(0,x)f(x) = \max(0, x)f(x)=max(0,x)
o Most popular due to its simplicity and efficiency. Used in hidden
layers of deep neural networks.
2. Sigmoid:
o f(x)=11+e−xf(x) = \frac{1}{1 + e^{-x}}f(x)=1+e−x1
o Commonly used in the output layer for binary classification
problems.
3. Tanh (Hyperbolic Tangent):
o f(x)=tanh⁡(x)f(x) = \tanh(x)f(x)=tanh(x)
o Used in hidden layers, better than Sigmoid for data centered
around zero.
4. Leaky ReLU:
o f(x)=max⁡(αx,x)f(x) = \max(\alpha x, x)f(x)=max(αx,x)
o A variant of ReLU that allows a small gradient for negative inputs
to prevent neuron death.
5. Softmax:
o f(xi)=exi∑jexjf(x_i) = \frac{e^{x_i}}{\sum_{j} e^{x_j}}f(xi)=∑jexjexi
o Used in the output layer for multi-class classification, converts
outputs into probabilities.
These functions are widely used in most deep learning tasks due to their
effectiveness and computational efficiency.
6.convolution nueral network
A Convolutional Neural Network (CNN) is a type of Deep Learning neural
network architecture commonly used in Computer Vision. Computer vision is a
field of Artificial Intelligence that enables a computer to understand and
interpret the image or visual data.
In a regular Neural Network there are three types of layers:
1. Input Layers: It’s the layer in which we give input to our model. The
number of neurons in this layer is equal to the total number of features
in our data (number of pixels in the case of an image).
2. Hidden Layer: The input from the Input layer is then fed into the hidden
layer. There can be many hidden layers depending on our model and
data size. Each hidden layer can have different numbers of neurons
which are generally greater than the number of features. The output
from each layer is computed by matrix multiplication of the output of
the previous layer with learnable weights of that layer and then by the
addition of learnable biases followed by activation function which makes
the network nonlinear.
3. Output Layer: The output from the hidden layer is then fed into a logistic
function like sigmoid or softmax which converts the output of each class
into the probability score of each class.
How Convolutional Layers works
Convolution Neural Networks or covnets are neural networks that share their
parameters. Imagine you have an image. It can be represented as a cuboid
having its length, width (dimension of the image), and height (i.e the channel
as images generally have red, green, and blue channels).

Now imagine taking a small patch of this image and running a small neural
network, called a filter or kernel on it, with say, K outputs and representing
them vertically. Now slide that neural network across the whole image, as a
result, we will get another image with different widths, heights, and depths.
Instead of just R, G, and B channels now we have more channels but lesser
width and height. This operation is called Convolution. If the patch size is the
same as that of the image it will be a regular neural network. Because of this
small patch, we have fewer weights.

7.tensor flow play ground


TensorFlow Playground is an interactive, web-based visualization tool for
exploring and understanding neural networks. Developed by the TensorFlow
team at Google, this tool allows users to visualize and manipulate neural
networks in real-time, providing a deeper understanding of how these models
work and their underlying principles. The TensorFlow Playground is an
invaluable educational resource for those interested in machine learning,
artificial intelligence, and deep learning.
Overview
The TensorFlow Playground is designed to provide an intuitive interface for
visualizing the inner workings of neural networks. It offers a variety of
customizable parameters, such as network architecture, activation functions,
regularization techniques, and learning rates, enabling users to experiment
with different configurations and observe their effects on model performance.
By allowing users to interact with neural networks in real-time, TensorFlow
Playground fosters a hands-on learning experience that promotes
comprehension of complex machine learning concepts.
Features
The TensorFlow Playground offers a range of features that enable users to
experiment with different aspects of neural networks. Some of the key features
include:
 Network Configuration: Users can customize the number of hidden
layers and neurons in each layer, as well as the input and output
dimensions.
 Activation Functions: The tool provides various activation functions,
such as ReLU, sigmoid, and tanh, allowing users to explore their effects
on model performance.
 Regularization Techniques: Users can experiment with different
regularization techniques, including L1 and L2 regularization, to
understand their impact on preventing overfitting.
 Learning Rate: The learning rate can be adjusted to observe its effect on
the training process and model convergence.
 Training Data: The TensorFlow Playground offers several pre-defined
datasets, such as the spiral, circle, and XOR patterns, to facilitate
experimentation with various data distributions and classification tasks.
 Loss Functions: Users can choose from different loss functions, including
cross-entropy and mean squared error, to evaluate model performance.
 Real-time Visualization: The tool provides real-time visualization of the
training process, including the decision boundaries, neuron activations,
and weight updates, enabling users to gain insights into the inner
workings of neural networks.
8.natural processing language processing?
Natural Language Processing (NLP) is a field of artificial intelligence (AI) that
focuses on the interaction between computers and humans through natural
language. NLP involves enabling computers to understand, interpret, and
generate human language in a way that is both meaningful and useful. Here are
some important NLP tasks:
1. Text Classification
 Automatically categorizing text into predefined categories (e.g., spam
detection, sentiment analysis).
2. Named Entity Recognition (NER)
 Identifying and classifying entities such as people, organizations, dates,
and locations within a text.
3. Machine Translation
 Translating text from one language to another (e.g., Google Translate).
4. Sentiment Analysis
 Determining the emotional tone behind a series of words, used in social
media monitoring or customer feedback.
5. Text Summarization
 Automatically generating a concise summary of a long document.
6. Speech Recognition
 Converting spoken language into text (e.g., voice assistants like Siri or
Alexa).
7. Question Answering
 Building systems that automatically answer questions posed by humans
in natural language.
8. Chatbots and Conversational AI
 Systems that can hold conversations with users, answering queries or
performing tasks.
NLP is widely applied in industries such as customer service, search engines,
healthcare, and more, allowing machines to better understand and interact
with human language.
9.word vectors
What Are Word Vectors?
 Representation: Word vectors are dense, continuous-valued vectors that
represent words in a lower-dimensional space compared to traditional
one-hot encoding, which is sparse and high-dimensional.
 Dimensionality: Typically, word vectors are represented in dimensions
ranging from 50 to 300, depending on the complexity of the vocabulary
and the amount of training data available.
Importance of Word Vectors
1. Semantic Similarity: Word vectors capture the meanings of words based
on their context. Words with similar meanings are closer together in the
vector space. For example, "king" and "queen" would have similar
vectors.
2. Reduction of Dimensionality: By converting words into a lower-
dimensional space, word vectors facilitate more efficient processing in
machine learning models.
3. Better Generalization: Models that use word vectors can generalize
better to unseen data, as they understand relationships between words
rather than treating them as independent entities.
Techniques for Generating Word Vectors
Several methods exist for creating word embeddings:
1. Word2Vec
o Models: Two primary models: Continuous Bag of Words (CBOW)
and Skip-Gram.
o Training: CBOW predicts a word based on its context, while Skip-
Gram predicts context words from a given word.
o Output: Produces word vectors that represent words based on
their surrounding words in large text corpora.
2. GloVe (Global Vectors for Word Representation)
o Concept: Focuses on capturing global word-word co-occurrence
statistics from a corpus to learn word embeddings.
o Training: Generates embeddings by optimizing the ratio of
probabilities of word co-occurrences.
3. FastText
o Enhancement: An extension of Word2Vec that considers subword
information, which means it can generate embeddings for out-of-
vocabulary words (words not seen during training) by looking at
the n-grams of characters.
o Use Case: Particularly useful for morphologically rich languages or
domains with specialized vocabulary.
10.local versus diastributed representation
In natural language processing (NLP) and machine learning, representations of
words or features can be categorized into two main types: local representation
and distributed representation. Here’s an overview of both:
Local Representation
 Definition: Local representation refers to a method where each word or
feature is represented using a unique, often sparse vector in a high-
dimensional space. This means that each dimension corresponds to a
specific feature, and a word is represented as a one-hot vector or similar
sparse encoding.
 Example:
o In one-hot encoding, if you have a vocabulary of 5 words: ["cat",
"dog", "fish", "bird", "mouse"], the word "dog" might be
represented as: dog=[0,1,0,0,0]\text{dog} = [0, 1, 0, 0,
0]dog=[0,1,0,0,0]
 Characteristics:
o Sparsity: The vectors are sparse, meaning most elements are zero.
o High Dimensionality: The dimensionality is equal to the size of the
vocabulary, which can lead to inefficiencies when dealing with
large vocabularies.
o Lack of Semantic Information: Local representations do not
capture the relationships or similarities between words. For
example, "cat" and "dog" would be equally distant from each
other and from "fish" in a one-hot representation.
Distributed Representation
 Definition: Distributed representation involves encoding words as dense
vectors in a lower-dimensional space, where the dimensions of the
vector do not correspond to specific features but rather capture
semantic meaning and relationships through numerical values.
 Example:
o A word like "dog" might be represented as:
dog=[0.12,−0.45,0.33,0.78]\text{dog} = [0.12, -0.45, 0.33,
0.78]dog=[0.12,−0.45,0.33,0.78]
 Characteristics:
o Density: The vectors are dense, meaning that most elements are
non-zero.
o Lower Dimensionality: The dimensionality is typically much lower
(e.g., 50 to 300 dimensions) compared to the size of the
vocabulary.
o Semantic Relationships: Distributed representations capture
semantic relationships. For example, the distance between the
vectors for "cat" and "dog" will be smaller than the distance
between "cat" and "car," reflecting their similarities.
o Generalization: They enable better generalization, as similar
words can share similar vector representations, allowing models to
leverage relationships between words.
Key Differences
Feature Local Representation Distributed Representation
Dimensionality High (equal to Low (usually 50 to 300
Feature Local Representation Distributed Representation
vocabulary size) dimensions)
Sparsity Sparse (mostly zeros) Dense (mostly non-zeros)
Semantic Lacks semantic Captures semantic meaning and
Information relationships relationships
Better generalization due to
Generalization Poor generalization
similarity
Examples One-hot encoding Word2Vec, GloVe, FastText, BERT

You might also like