Parameter Sharing and Typing in Machine Learning
Last Updated :
17 Jun, 2024
We usually apply limitations or penalties to parameters in relation to a fixed region or point. L2 regularisation (or weight decay) penalises model parameters that deviate from a fixed value of zero, for example.
However, we may occasionally require alternative means of expressing our prior knowledge of appropriate model parameter values. We may not know exactly what values the parameters should take, but we do know that there should be some dependencies between the model parameters based on our knowledge of the domain and model architecture.
We frequently want to communicate the dependency that various parameters should be near to one another.
Parameter Typing
Two models are doing the same classification task (with the same set of classes), but their input distributions are somewhat different.
- We have model A has the parameters\boldsymbol{w}^{(A)}
- Another model B has the parameters \boldsymbol{w}^{(B)}
\hat{y}^{(A)}=f\left(\boldsymbol{w}^{(A)}, \boldsymbol{x}\right)
and
\hat{y}^{(B)}=g\left(\boldsymbol{w}^{(B)}, \boldsymbol{x}\right)
are the two models that transfer the input to two different but related outputs.
Assume the tasks are comparable enough (possibly with similar input and output distributions) that the model parameters should be near to each other: \forall i, w_{i}^{(A)} should be close to w_{i}^{(B)} . We can take advantage of this data by regularising it. We can apply a parameter norm penalty of the following form: \Omega\left(\boldsymbol{w}^{(A)}, \boldsymbol{w}^{(B)}\right)=\left\|\boldsymbol{w}^{(A)}-\boldsymbol{w}^{(B)}\right\|_{2}^{2} . We utilised an L2 penalty here, but there are other options.
Parameter Sharing
The parameters of one model, trained as a classifier in a supervised paradigm, were regularised to be close to the parameters of another model, trained in an unsupervised paradigm, using this method (to capture the distribution of the observed input data). Many of the parameters in the classifier model might be linked with similar parameters in the unsupervised model thanks to the designs. While a parameter norm penalty is one technique to require sets of parameters to be equal, constraints are a more prevalent way to regularise parameters to be close to one another. Because we view the numerous models or model components as sharing a unique set of parameters, this form of regularisation is commonly referred to as parameter sharing. The fact that only a subset of the parameters (the unique set) needs to be retained in memory is a significant advantage of parameter sharing over regularising the parameters to be close (through a norm penalty). This can result in a large reduction in the memory footprint of certain models, such as the convolutional neural network.
Convolutional neural networks (CNNs) used in computer vision are by far the most widespread and extensive usage of parameter sharing. Many statistical features of natural images are translation insensitive. A shot of a cat, for example, can be translated one pixel to the right and still be a shot of a cat. By sharing parameters across several picture locations, CNNs take this property into account. Different locations in the input are computed with the same feature (a hidden unit with the same weights). This indicates that whether the cat appears in column i or column i + 1 in the image, we can find it with the same cat detector.
CNN's have been able to reduce the number of unique model parameters and raise network sizes greatly without requiring a comparable increase in training data thanks to parameter sharing. It's still one of the best illustrations of how domain knowledge can be efficiently integrated into the network architecture.
Similar Reads
Tuning Machine Learning Models using Caret package in R
Machine Learning is an important part of Artificial Intelligence for data analysis. It is widely used in many sectors such as healthcare, E-commerce, Finance, Recommendations, etc. It plays an important role in understanding the trends and patterns in our data to predict useful information that can
15+ min read
Binary Variables - Pattern Recognition and Machine Learning
A binary variable is a categorical variable that can only take one of two values, usually represented as a Boolean â True or False â or an integer variable â 0 or 1 â where 0 typically indicates that the attribute is absent and 1 indicates that it is present. These variables are often used to model
6 min read
Regression in machine learning
Regression in machine learning refers to a supervised learning technique where the goal is to predict a continuous numerical value based on one or more independent features. It finds relationships between variables so that predictions can be made. we have two types of variables present in regression
5 min read
Inference and Decision - Pattern Recognition and Machine Learning
Inference and decision-making are fundamental concepts in pattern recognition and machine learning. Inference refers to the process of drawing conclusions based on data, while decision-making involves selecting the best action based on the inferred information. Spam detection, for example, employs i
5 min read
Top 20 ChatGPT Prompts For Machine Learning
Machine learning has made significant strides in recent years, and one remarkable application is ChatGPT, an advanced language model developed by OpenAI. ChatGPT can engage in natural language conversations, making it a versatile tool for various applications. In this article, we will explore the to
10 min read
Fuzzy Logic for Uncertainty Management in Machine Learning
Uncertainty in machine learning refers to the inherent unpredictability in model predictions due to factors like data variability and model limitations. Fuzzy logic is a mathematical framework used to handle imprecise and uncertain information by allowing partial truth values between completely true
7 min read
What is No-Code Machine Learning?
As we know Machine learning is a field in which the data are provided according to the use case of the feature engineering then model selection, model training, and model deployment are done with programming languages like Python and R. For developing the model the person or developer must have the
10 min read
Difference between Statistical Model and Machine Learning
In this article, we are going to see the difference between statistical model and machine learningStatistical Model:Â A mathematical process that attempts to describe the population from which a sample came, which allows us to make predictions of future samples from that population.Examples: Hypothes
6 min read
Probabilistic Models in Machine Learning
Machine learning algorithms today rely heavily on probabilistic models, which take into consideration the uncertainty inherent in real-world data. These models make predictions based on probability distributions, rather than absolute values, allowing for a more nuanced and accurate understanding of
6 min read
Machine Learning Roadmap
Nowadays, machine learning (ML) is a key tool for gaining insights from complex data and driving innovation in many industries. As more businesses rely on data for decision-making, having machine learning skills is more important than ever. By mastering ML, you can tackle real-world problems and cre
11 min read