Kerala to Sweden: Migrant Job Alignment
Kerala to Sweden: Migrant Job Alignment
Submitted by
Sukanya Thayyil Sunilkumar
Supervisor
Per Johansson
Spring, 2024
ABSTRACT
2
Contents
1 Introduction 7
2 Literature Review 9
2.1 Existing Research on Migration . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.1 Global and Indian Context . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.2 Theoretical Frameworks . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Job Qualification Alignment . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2.1 Definition and Importance . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2.2 Previous Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3 Challenges in Migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3.1 Common Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3.2 Specific Challenges in Sweden . . . . . . . . . . . . . . . . . . . . . . 10
2.4 Sampling Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.4.1 Data Collection Methods . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.4.2 Online Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3 Methodology 10
3.1 Research Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.1.1 Sampling Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.1.2 Sampling Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2 Data Collection Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2.1 Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2.2 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.3 Data Cleaning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.3.1 Standardization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.3.2 Handling Missing Data . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.4 Ethical Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.4.1 Informed Consent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.4.2 Confidentiality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4 Descriptive Statistics 15
4.1 Bar Plots Explanations for Variables (Appendix F) . . . . . . . . . . . . . . . 15
3
4.2 Summary of Statictics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
6 Discussion 33
6.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
6.2 Implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
6.3 Limitations and Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
References 36
Appendix 38
B Model Implementation 39
D Multicollinearity 40
4
G Missing Data Pattern and Imputation 51
List of Figures
1 LASSO Coefficient Paths for Various Predictors. . . . . . . . . . . . . . . . . 41
2 1. Gender . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3 2. Age . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4 3. Why did you move from Kerala to Sweden? . . . . . . . . . . . . . . . . . . 42
5 4. When did you arrive in Sweden? . . . . . . . . . . . . . . . . . . . . . . . . 42
6 5. Did you move to Sweden alone or with family? . . . . . . . . . . . . . . . . 43
7 6. What is your current visa status? . . . . . . . . . . . . . . . . . . . . . . . . 43
8 7. How did you apply for your visa the first time? . . . . . . . . . . . . . . . . 43
9 8. Did you have an interview for the first-time visa process? . . . . . . . . . . 43
10 9. What is your highest level of education? . . . . . . . . . . . . . . . . . . . . 44
11 10. What is your educational background? . . . . . . . . . . . . . . . . . . . . 44
12 11. Do you have a job? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
13 12. Does your current job match your qualifications? . . . . . . . . . . . . . . 45
14 13. In your job (part-time or full-time), are you receiving a salary or amount as
specified by Swedish regulations? . . . . . . . . . . . . . . . . . . . . . . . . 45
15 14. How satisfied are you with the job you have now? . . . . . . . . . . . . . . 45
16 15. What do you think are the main problems in getting a job in Sweden? . . . 46
17 16. Are you satisfied with medical care in Sweden? . . . . . . . . . . . . . . . 46
18 17. Have you experienced any health issues after moving to Sweden? . . . . . 47
19 18. How much do you care about your mental and physical health after moving
to Sweden? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
20 19. How satisfied are you with Swedish food culture compared to Kerala food
culture? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
21 20. How satisfied are you with the amount of personal or family time you have
after moving to Sweden? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
22 21. How satisfied are you with your life after moving to Sweden? . . . . . . . . 48
23 22. Which district are you from in Kerala? . . . . . . . . . . . . . . . . . . . . 48
24 23. Which county are you currently residing in Sweden? . . . . . . . . . . . . 49
5
25 24. How would you rate cultural integration with Swedish society? . . . . . . . 49
26 25. Have you faced any challenges after moving to Sweden? If yes, how often
have you faced these challenges within the first year you came? . . . . . . . . . 50
27 26. What type of accommodation do you have? . . . . . . . . . . . . . . . . . 50
28 27. What is your rent range per month? . . . . . . . . . . . . . . . . . . . . . 50
29 28. Are you a parent? If yes, how satisfied are you with parental benefits in
Sweden? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
30 29. How much knowledge did you have about Sweden before moving? . . . . . 51
31 Missing Model Data Pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
32 Missing Data Pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
List of Tables
1 Cross-tabulation of Job_Match and Gender . . . . . . . . . . . . . . . . . . . 23
2 Cross-tabulation of Job_Match and Age . . . . . . . . . . . . . . . . . . . . . 24
3 Cross-tabulation of Job_Match and Arrival_Year . . . . . . . . . . . . . . . . 25
4 Cross-tabulation of Job_Match and County_Sweden . . . . . . . . . . . . . . . 25
5 Cross-tabulation of Job_Match and Education_Level . . . . . . . . . . . . . . 26
6 Cross-tabulation of Job_Match and Educational_Background . . . . . . . . . . 27
7 Cross-tabulation of Job_Match and Has_Job . . . . . . . . . . . . . . . . . . . 27
8 Positive Coefficients in LASSO . . . . . . . . . . . . . . . . . . . . . . . . . . 30
9 Negative Coefficients in LASSO . . . . . . . . . . . . . . . . . . . . . . . . . 31
10 Confusion Matrix: Cross-tabulation of Predicted vs. Actual Job Match . . . . . 32
11 Performance Metrics for Lasso Logistic Regression Model . . . . . . . . . . . 33
12 Base Level Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
13 Unimportant Variables in LASSO . . . . . . . . . . . . . . . . . . . . . . . . 38
14 Variance Inflation Factors (VIF) for Independent Variables . . . . . . . . . . . 40
15 Table with Highlighted Imputed Values in the Model Data . . . . . . . . . . . 52
6
1 Introduction
The experience of moving from Kerala, a state in India, to Sweden has raised many questions
about both the pros and cons of migration. The primary challenge in addressing these questions
was the lack of data; no data exists specifically on people from Kerala residing in Sweden. To
address this issue, a survey was designed, and data were collected from individuals of Kerala
origin, enabling answers to some of these questions that may also be of interest to others.
As people from Kerala speak Malayalam, they are generally called as Malayalies. In this
study, the terms "People from Kerala" and "Malayalies" will be used interchangeably. Malay-
alies have a long history of migrating to different parts of the world. There is a popular joke
in Kerala that says, "If you go to the moon, you will see a Malayali running a coffee shop
there." While intended as a joke, this illustrates how far and wide Malayalies have spread, even
imagining them on the moon. Migration occurs for a variety of reasons, with Sweden being
perceived by many as a land of opportunities. However, there are also common beliefs within
the community that finding a job in Sweden, particularly part-time employment, is challeng-
ing. This study seeks to confirm or dispel such beliefs and examine whether the jobs Malayali
migrants obtain align with their skills and qualifications. Additionally, the study aims to un-
derstand the motivations, experiences, and challenges faced by Malayalies who have moved to
Sweden. Three main objectives are outlined in this study:
1. Determine whether the jobs that Kerala migrants obtain in Sweden align with their qual-
ifications. The feasibility of modelling this data to predict job-qualification alignment,
identifying key influencing factors, and assessing the potential accuracy of the model will
be explored.
3. Explore their experiences and the challenges they encounter in the process of moving to
and integrating into Swedish society.
One major challenge in data collection was the absence of a sampling frame for the Ker-
ala population living in Sweden, making it impossible to use a probability sampling method.
Instead, Judgmental sampling (Perla and Provost, 2012) and Snowball sampling (Goodman,
1961), both non-probability sampling methods, were employed. Judgmental sampling involved
7
using prior knowledge of Kerala individuals in Sweden to gather data, while in snowball sam-
pling, current participants helped recruit future participants from their networks. This approach
allowed access to groups and communities that might otherwise be difficult to reach. This
mixed non-sampling method likely resulted in a higher response rate than would have been
achieved using random sampling. Additionally, the method is expected to provide more re-
liable responses than those from a random sample. However, it is important to note that the
results may not be fully generalizable to the Kerala population living in Sweden. A total of 716
responses were received. While no exact figure exists for the number of Malayalies in Sweden,
an estimate based on survey responses suggests that approximately 2,000-3,000 Malayalies re-
side in the country. This estimate was obtained by aggregating information from Malayalies
living in various counties, Facebook groups, WhatsApp groups, and the Indian Embassy in
Sweden. Approximately 25-35% of the Malayali population in Sweden responded to the sur-
vey, suggesting that the responses may provide a representative picture of the experiences of
Malayalies in Sweden.
The survey yielded several interesting findings. Immigration from Kerala to Sweden began
before 2000 and includes individuals of all ages, including pensioners, living in various types of
housing, with apartments being the most common. The sample also showed a higher proportion
of women than men, with Vastra Gotaland County hosting the largest Malayali community in
Sweden. An increasing trend of migration from Kerala to Sweden was observed, and most
respondents reported being satisfied with their lives in Sweden.
The following sections of this paper will delve deeper into various aspects of the study.
The Literature Review will explore existing research on migration, job-qualification align-
ment, challenges faced by migrants, and sampling methods. The Methodology section will
describe the survey design, data collection processes, data cleaning and analysis methods, and
ethical considerations. Descriptive Statistics will present visualizations and initial interpre-
tations of the survey data. In Analysis of Statistical Modeling, the LASSO model and other
statistical techniques will be used to analyze job alignment, with results explaining the impact
of coefficients and interpreting findings from the statistical models, as well as assessing model
performance. Finally, the Discussion will summarize the results, implications, limitations, and
challenges of the study.
8
2 Literature Review
Migration has been a significant area of research globally, with various studies focusing on the
economic, social, and cultural impacts of migration. Research by Cassarino (2004) and Larsson
(2024) highlights the complexities of modern migration patterns and their implications for both
sending and receiving countries. The migration from Kerala, India, to Sweden, is a relatively
underexplored area.
Various theoretical frameworks have been used to study migration, including neoclassical eco-
nomic theories, push-pull models, and transnationalism. For example, Lee (1966) introduced
the push-pull theory, which remains a foundational concept in migration studies.
Job qualification alignment refers to the degree to which migrants’ qualifications and skills
match the demands of the labour market in the host country. Chiswick and Miller (2003)
emphasizes the importance of this alignment for both economic integration and job satisfaction
among migrants.
Previous studies have shown mixed results regarding the job qualification alignment among
migrants. While some studies, like Piracha and Vadean (2013), suggest that migrants often
face a mismatch, others, such as Bratsberg et al. (2002), indicates that over time, migrants tend
to find jobs that better match their qualifications.
9
2.3 Challenges in Migration
Migrants face a variety of challenges, including language barriers, cultural differences, and
legal issues. According to Martin (2009), these challenges can be exacerbated during economic
downturns, which affect migrants’ ability to find employment.
In Sweden, migrants often struggle with integrating into the labour market due to stringent qual-
ification recognition processes and language requirements (Lundborg and Skedinger, 2013).
These challenges are particularly pronounced for non-European migrants.
Data collection in migration studies typically involves both qualitative and quantitative meth-
ods. Surveys, interviews, and administrative data are common sources. Groves et al. (2011)
provides an in-depth discussion on survey methodology, which is often employed in migration
research.
With the advent of digital technologies, online data collection methods have become increas-
ingly popular. Evans and Mathur (2009) discusses the advantages and limitations of online
surveys, particularly in reaching migrant populations.
3 Methodology
The sampling design was carefully structured to collect relevant data from Malayalies in Swe-
den. Initially, a draft questionnaire was created based on field knowledge and the questions
10
that commonly arise when considering migration to Sweden. Additionally, the experiences of
students admitted to Uppsala University in recent years were drawn upon.
To modify this draft, feedback was gathered from Malayalies both inside and outside Swe-
den, representing various age groups and sectors. Their insights were used to improve the ques-
tions and address the research concerns effectively. Following this, a pilot study questionnaire
was developed, and 11 Malayalies currently residing in Sweden were invited to participate. A
diverse group was selected among these participants to ensure a wide range of feedback was
obtained.
Based on the feedback from this pilot study, adjustments were made, including the addition
of new questions, the removal of some, and the reordering of others. The final questionnaire
consists of 9 sections with a total of 30 questions:
1. Personal Information - 2 questions (In Appendix F, figures 2 and 3 are the questions
with responses bar plots.)
2. Migration Details - 6 questions (In Appendix F, figures 4 to 9 are the questions with
responses bar plots.)
4. Health and Satisfaction - 6 questions (In Appendix F, figures 17 to 22 are the questions
with responses bar plots.)
6. Accommodation - 2 questions (In Appendix F, figures 27 and 28 are the questions with
responses bar plots.)
7. Parental and Health Benefits - 1 question (In Appendix F, the figure 29 is the question
with responses bar plot.)
9. Consent for Data Collection - 1 question (30. Do you consent to the collection and use
of your data for this survey?)
11
Out of the 30 questions, 27 are multiple-choice, while 3 (In Appendix F, the figures 4,
16, and 18 are the questions.) allow for multiple responses. All questions are mandatory, but
participants have the option to skip a question using choices like "Not preferred to say", "Do
not want to answer", or "Do not want to answer/Do not know". Each question also includes an
’Other’ option, allowing participants to provide responses that may not have been covered in the
listed options, ensuring that everyone has the freedom to accurately represent their situation.
Any Missing data1 in the responses will be due to an intentional choice not to answer, as
indicated in the questionnaire.
To make the data collection more accessible and reduce misunderstandings, the question-
naire was provided in English with Malayalam2 translations in parentheses. This approach
helps participants better understand the questions and ensures more accurate responses.
As there was no sampling frame of Malayalies in Sweden, a probability sampling method could
not be used for data collection. Instead, non-probability sampling methods such as Judgmen-
tal sampling (Perla and Provost (2012)) and Snowball sampling (Goodman (1961)) were em-
ployed. Knowledge of this field and connections with Kerala immigrants in Sweden were valu-
able in collecting and distributing the survey. These methods are often considered to yield more
accurate results than random sampling in this context. In Judgmental sampling, knowledge was
used to select the most suitable participants for the study. Snowball sampling allowed current
participants to recruit others from their networks, facilitating access to groups and communi-
ties that might otherwise be difficult to reach. By combining these methods, more responses
were gathered, targeting the most relevant individuals and expanding the reach through their
connections.
3.2.1 Distribution
The survey was created using Google Forms and distributed through various online platforms
and social networks, including Email, Facebook, Instagram, WhatsApp, and LinkedIn. This
1
Responses marked as "Not preferred to say", "Do not want to answer", or "Do not want to answer/Do not
know" are considered as Missing data.
2
Malayalam is the native language of Kerala
12
method was chosen to reach as many participants as possible and to make it convenient for
them to complete the survey at their own pace, using their preferred devices.
3.2.2 Challenges
Several challenges were encountered during data collection, including few respondents and in-
complete responses. The data collection period was initially planned for two weeks. However,
after the first week, it was realized that the number of responses was too few to be meaningful
for analysis. To address this issue, the Google Form link was recirculated with a video request
instead of a written request, which resulted in doubling the response rate.
Despite this improvement, the desired number of responses had not been reached by the
end of the two-week period. To gather more data, the collection period was extended by an
additional week, and follow-up reminders were sent.
Initially, email addresses were collected in the survey, but it was observed that some re-
spondents were hesitant to share personal details. Consequently, the requirement to provide an
email address was removed, making the survey simpler and more attractive to participants. As
a result, the total number of respondents reached N = 716.
In the judgmental sampling, an attempt was made to reach at least one person in every
county. However, some counties, such as Blekinge, Jamtland, Kalmar, Norrbotten, and Varm-
land, were not reached. It remains unclear whether this was due to the absence of Malayalies
in these counties, the survey not being seen in time, or respondents being unwilling to partic-
ipate. It is known that Malayalies reside in Kalmar County, but no responses were received
from there. This was one of the challenges faced during data collection.
3.3.1 Standardization
All responses were converted into categorical factors. Responses marked as "Not preferred
to say," "Do not want to answer," or "Do not want to answer/Do not know" were treated as
missing data to improve the model. These responses were considered missing because they
didn’t provide useful information about the questions being studied. Treating them as missing
helps prevent the model from being affected by answers that don’t add any real value, which
could lead to incorrect or biased results. By handling these responses as missing, the analysis
13
can focus on the answers that do provide useful data, leading to better predictions and a clearer
understanding of the survey results.
In the "Other" option, respondents were provided space to write comments, which was
helpful in understanding their exact situations. For analysis purposes, these comments were
converted into appropriate categories or left as "Other" when necessary. Some comments were
explanations of existing categories, while others offered new opinions. The new opinions were
incorporated into relevant categories for analysis.
Missing data were identified in the 27 multiple-choice questions, as shown in Figure 32 (Ap-
pendix G), and in the modelling data, as shown in Figure 31 (Appendix G). To handle missing
3
data in the subsequent multivariate modelling, Multiple Imputations were used through the
"MICE" package in R programming software, as discussed in Zhang (2016). The results of this
imputation are presented in Table 15 (Appendix G). It should be noted that some categories in
the table are hidden to protect the privacy of the responses.
Multiple Imputation creates several different plausible datasets, and in the analysis, the
results from each one are combined to obtain the correct inference.
Participants were informed about the purpose of the study, what their involvement would be,
and how their data would be used. Informed consent was obtained before participants com-
pleted the survey, making sure they understood that their participation was completely volun-
tary.
3.4.2 Confidentiality
Strong steps were taken to protect participant’s privacy. No personal information was collected
in the survey, ensuring that responses remain completely anonymous. The data is presented
3
Multiple Imputation (MI) is a statistical technique used to handle missing data in datasets. Instead of filling in
missing values with a single estimate, MI replaces each missing value multiple times, creating several "complete"
datasets. These datasets are then analyzed separately, and the results are pooled to provide a more robust and
accurate estimate.
14
only in summary form, making it impossible to trace any answers back to an individual. All
data will be securely stored and accessible only to those directly involved in the study. These
precautions ensure that participant’s privacy is protected throughout the research.
4 Descriptive Statistics
Gender Figure 2 shows the distribution of respondents by gender. The majority are Female
(58.40%), followed by Male (40.80%). Only a small percentage identified as Other (0.10%),
and even fewer chose not to disclose their gender (0.70%).
Age Figure 3 shows the age distribution of respondents. The largest group is in the 33-37 age
range (35.60%), followed by those in the 28-32 range (31.10%) and 38-42 (16.80%). Younger
and older age groups are less represented, with the smallest groups being those aged 53-57 and
63-67.
Move Reason Figure 4 shows the reasons for moving to Sweden. The most common reason is
Job (46.40%), followed by Career growth (33.20%), Higher education (32.50%) and To settle
in Europe (29.30%). The least common reason was Family reunion (21.10%), and Schengen
visa (8.70%).
Arrival Year Figure 5 shows the distribution of respondents by the year they arrived in Swe-
den. The highest numbers of arrivals occurred in 2023 (26.50%) and 2022 (21.80%). This
suggests a recent surge in immigration. An increasing trend in arrivals is noticeable over the
years. The trend showed a strong and growing migration flow into Sweden in recent years.
The years 2020 and 2024 show lower numbers of arrivals compared to other years. The lower
number for 2020 can be due to the COVID-19 pandemic, which severely affected global travel
and immigration. The year 2024 is also a low number, due to the year is not yet completed.
The data shows a significant increase in the number of Malayalies arriving in Sweden. It may
be affected by snowball sampling but still, the trend may not be changed even if counts can
change.
15
Move With Family Figure 6 shows whether respondents moved alone or with family. A
large majority moved With family (75.80%), while a smaller portion moved Alone (24.00%).
Visa Status Figure 7 shows the visa statuses of respondents. The largest groups hold a De-
pendent visa (28.40%) and a Job visa (27.90%). Smaller proportions are Student (17.00%) and
Citizenship / Swedish passport (12.00%). The least common status are Permanent residence
(10.60%) and Job seeker (3.40%).
Visa Application Figure 8 shows who applied for the visa. The most common applicant was
Company (38.70%), followed by Yourself / Family Member (31.10%) and Agency (29.20%). A
small number did not specify (1.00%).
Visa Interview Figure 9 shows whether respondents had a visa interview. A majority did
not have an interview (70.30%), while some of them had interview (28.60%). A very small
percentage did not provide information (1.10%).
Education Level Figure 10 shows the highest level of education attained. Most respondents
have a Master’s degree (51.50%), followed by Bachelor’s degree (42.00%). PhD and above,
Higher secondary and Secondary are less common, with 5.00%, 1.10% and 0.30%, respec-
tively.
Educational Background Figure 11 shows the distribution of fields of study. The most
common field is Information Technology (IT) with 32.40%, followed by Electrical Engineer-
ing (13.10%) and Mechanical Engineering (9.90%). Fields such as Tourism and Hospitality,
Social Sciences, Human Resources Management, Architecture and Design and Agriculture and
Veterinary Science have the fewest participants, each with 0.60%. The least Education back-
ground is Environmental Studies with 0.30%.
Employment status Figure 12 shows the current employment status. The majority are Yes,
full-time employed (57.70%), followed by those Yes, part-time employed (20.10%). Job Search-
ing (15.10%), and Students are very few (6.70%). Pensionist (0.10%) is a rare case. There are
also some who did not specify their status (0.3%).
16
Job Match Figure 13 shows how well respondents feel their job matches their qualifications.
Most feel their job matches their qualifications (53.10%). However, 14.80% feel their job does
not match their qualifications, 9.40% feel Overqualified, and 0.70% feel Underqualified. A
significant portion (21.80%) finds the job match not applicable.
Job Satisfaction Figure 15 shows job satisfaction levels. The largest group is very satisfied
(28.10%), followed closely by those who are satisfied (27.80%). A significant portion, 21.80%,
find job satisfaction not applicable. Smaller percentages are very dissatisfied (1.50%) and
dissatisfied (5.30%).
Job Problems Figure 16 shows job-related challenges faced by respondents. The most im-
portant issue is the language barrier (79.30%), indicating significant difficulty in communica-
tion. This is followed by lack of professional network (38.50%), which affects career growth
and job searching. Limited job opportunities in my field is reported by 25.70% of participants,
highlighting constraints in job availability specific to their qualifications. Issues such as lack of
recognition of foreign qualifications and cultural differences are less common but still notable.
Medical Care Satisfaction Figure 17 shows the satisfaction levels with medical care. The
largest group feels neutral (39.00%) about the medical services they receive, suggesting a
mixed or moderate experience. A significant portion is dissatisfied (20.00%) and very dis-
satisfied (7.10%), indicating notable dissatisfaction with the medical care. Satisfied and Very
17
Satisfied are correspondingly 24.70% and 6.10%. A small percentage did not provide a re-
sponse (3.10%).
Health Issues Figure 18 displays reported health issues among participants. Vitamin deficien-
cies are the most common health problem, reported by 43.30% of respondents, highlighting a
prevalent concern. Depression affects 16.10% of participants, indicating a significant mental
health issue. Common health issues/body or part pains and allergies are less frequent. The Not
applicable category is the largest (37.40%), suggesting that many respondents do not experi-
ence significant health issues.
Health Care Satisfaction Figure 19 shows ratings of overall health care after moving to
Sweden. The majority find the care to be very much (24.00%), quite a lot (34.50%) and some-
what (27.50%), indicating a generally positive view. Smaller proportions rate it as not much
(11.30%), or not at all (2.40%).
Food Culture Satisfaction Figure 20 shows satisfaction with local food culture. The largest
group is neutral (45.70%), showing mixed feelings about the local food. Satisfied respondents
make up 23.60%, and a smaller portion is dissatisfied (14.50%) or very dissatisfied (6.60%). A
small fraction is very satisfied (8.40%).
Family Time Satisfaction Figure 21 shows satisfaction with time spent with family. The
majority feel very satisfied (43.60%) or satisfied (34.60%), indicating a strong positive expe-
rience with family time. Smaller segments are neutral (14.50%), dissatisfied (3.80%), or very
dissatisfied (2.10%).
Life Satisfaction Figure 22 indicates overall life satisfaction. Most respondents are satisfied
(46.90%) or very satisfied (26.50%), reflecting a generally positive outlook on life. Smaller
percentages are neutral (19.80%), dissatisfied (5.20%), and very dissatisfied (1.00%).
District of Kerala Figure 23 shows the distribution of participants from various districts
in Kerala. Ernakulam has the highest representation (20.50%), followed by Thrissur (12.20%)
and Kottayam (10.10%). The least represented districts are Idukki (1.80%), Kasaragod (1.40%)
and Wayanad (1.00%).
18
County in Sweden Figure 24 Shows the distribution of participants across Swedish coun-
ties. The most represented county is Vastra Gotaland County (33.90%), followed by Skane
County (15.20%) and Stockholm County (13.70%). The least represented counties are Gavle-
borg (0.30%), Vasternorrland (0.30%) and Orebro (0.10%).
Challenges Faced Figure 26 shows the distribution of challenges faced by the respondents.
The majority of participants reported experiencing challenges Sometimes (44.60%), while 22.20%
faced challenges Rarely. A significant portion also indicated they encountered challenges Of-
ten (18.30%). Those who faced challenges Very often constituted 7.10%, whereas only 6.30%
reported Never experiencing challenges. A small percentage of respondents (1.50%) did not
provide any response.
Accommodation Type Figure 27 shows the types of accommodation respondents are living
in. The majority reside in Apartments (64.80%), followed by those living in a House (15.10%).
Other types of accommodation include Shared living (5.60%) and various forms of Student
housing, such as Studio apartments (5.40%), Corridor rooms (4.20%), One-room apartment
(2.70%) and a Two or more room apartment (2.10%). A negligible number (0.10%) did not
specify their accommodation type.
Rent Range Figure 28 Shows the distribution of respondent’s rent ranges. The most common
rent ranges are 5,000-7,500 SEK and 7,500-10,000 SEK, each accounting for 18.90% of the
respondents. Another significant group pays between 10,000-12,500 SEK (17.00%). Fewer re-
spondents pay 12,500-15,000 SEK (9.90%) and Less than 5,000 SEK (9.60%). Smaller groups
pay 15,000-17,500 SEK (3.90%), 17,500-20,000 SEK (1.80%), and More than 20,000 SEK
(1.10%). Additionally, 15.40% of respondents reported Not Paying Rent, while 3.50% did not
provide a response. It is possible that some of the respondents who own homes and do not pay
rent might be paying loans instead. Those who are paying loans could be reflected in the higher
rent ranges or in the Not Paying Rent category.
19
Parental Status Figure 29 Shows the respondent’s satisfaction with their parental status. A
significant portion of respondents reported being Very satisfied (26.30%) and Satisfied (25.70%)
with their parental status. Another large group indicated that this question was Not applicable
to them (35.10%). Those who were Neutral made up 9.10% of the respondents. Only a small
percentage expressed being Dissatisfied (1.00%) and Very dissatisfied (0.30%), with 2.70% not
providing a response.
Sweden Knowledge Figure 30 shows the respondent’s level of knowledge about Sweden be-
fore coming to Sweden. The largest group described themselves as Moderately knowledgeable
(45.80%), followed by those who are Slightly knowledgeable (33.40%). A smaller segment of
respondents considered themselves Not knowledgeable at all (14.80%), while 5.60% felt they
were Very knowledgeable. A minimal percentage (0.40%) did not respond to this question.
Of the total sample of N =716 Responders we can see that most participants are female, with
58.40% identifying as female and the majority being in the age range of 33 to 37 years. Many
people moved to Sweden mainly for job opportunities, with 46.40% choosing this reason. The
largest group of respondents arrived in 2023, indicating a recent increase in migration. Most
people moved to Sweden with their families (75.80%). Regarding visa status, the most common
were dependent visas (28.40%) and job visas (27.90%). The majority (38.70%) had their visa
applications handled by their companies, and most did not have a visa interview (70.30%).
In terms of education, more than half of the respondents hold a Master’s degree (51.50%),
with a significant number having a background in Information Technology (32.40%). Em-
ployment data shows that 57.70% of participants are employed full-time. Most respondents
(53.10%) believe their current job matches their qualifications, and 64.00% report receiv-
ing salaries in compliance with Swedish regulations. Job satisfaction is generally high, with
28.10% being very satisfied. However, language barriers are a significant issue, affecting
79.30% of participants.
When it comes to medical care, 39.00% of respondents feel neutral about the quality of care
they receive. Health issues such as vitamin deficiencies are common, reported by 43.30% of
participants. Overall, most people are satisfied with their healthcare experience, with 34.50%
rating it as "quite a lot". Opinions on local food culture are mixed, with 45.70% feeling neutral.
20
However, family time is a positive aspect for many, with 43.60% very satisfied with the time
they spend with their families.
Life satisfaction is high, with 46.90% of respondents being satisfied. The majority of re-
spondents come from the district of Ernakulam in Kerala (20.50%) and live in Vastra Gotaland
County in Sweden (33.90%). Cultural integration seems to be challenging for many, with
50.70% rating it as medium. Most participants face challenges sometimes, with 44.60% re-
porting occasional difficulties.
In terms of living conditions, 64.80% of respondents live in apartments, and the most
common rent ranges are between 5,000 and 10,000 SEK. Parental satisfaction is mixed, with
35.10% finding it not applicable to their situation. Finally, most people felt moderately knowl-
edgeable about Sweden before moving, with 45.80% rating their knowledge at this level.
Originally, "Job_Match" had five categories: "Not applicable," "Yes, it matches," "No, it doesn’t
match," "No, I am overqualified," and "No, I am underqualified." For our analysis, we excluded
the "Not applicable" responses, which left us with 558 valid responses out of 716.
4
Originally, "Job_Match" had five categories: "Not applicable," "Yes, it matches," "No, it doesn’t match,"
"No, I am overqualified," and "No, I am underqualified." For our analysis, we excluded the "Not applicable"
responses, which left us with 558 valid responses out of 716.
21
Since the counts for "No, it doesn’t match," "No, I am overqualified," and "No, I am under-
qualified" were very low compared to "Yes, it matches" (as shown in Figure 13), we combined
these three categories into a single "No" category. The "Yes, it matches" category was kept as
is. We coded "No" as 0 and "Yes, it matches" as 1.
By recording the responses in this way, "Job_Match" became a binary variable, making it
suitable for binary logistic regression analysis.
As a consequence of some categories having very low responses, smaller categories were com-
bined into broader ones. The changes were made as follows:
• Gender: The categories "Female" and "Other" were combined into a single category
called "Female." This was done to maintain data balance and improve the model’s per-
formance. As shown in Figure 2 (Appendix F), "Female" and "Other" were aggregated
to avoid losing information from less frequent gender categories. This approach ensures
that the model is not adversely affected by low-frequency categories. While outliers in
categorical data are less of a concern, ensuring balance between categories is important
for model accuracy.
• Age: Ages above 43 were grouped into one category named "43 and above". (See Figure
3, Appendix F)
• Arrival Year: All years before 2018, including 2018, were combined into the category
"2018 and before". The years 2023 and 2024 were grouped into "2023-2024" since 2024
is not yet complete. (See Figure 5, Appendix F)
• Education Level: The categories "Secondary" and "Higher secondary" were combined
into "Higher secondary and below". (See Figure 10, Appendix F)
22
– Various educational backgrounds including "Natural Sciences," "Finance and Ac-
counting," "Data Science," "Education and Pedagogy," "Media and Communica-
tions," "Human Resources Management," "Architecture and Design," "Agriculture
and Veterinary Science," "Legal Studies," "Mathematics/Statistics," "Tourism and
Hospitality," "Environmental Studies," "Arts and Humanities," and "Social Sci-
ences" were combined into "Other Education background".
• Has Job: The categories "Yes, full-time" and "Pensionist" were combined into "Yes, full-
time" because pensionists receive income similar to full-time employees. (See Figure 12,
Appendix F)
• County Sweden: Counties with fewer than 40 responses (See Figure 24, Appendix F)
were grouped into an "Other county" category. The "Other county" category includes:
Gender
Job Match Total
Female Male
No 99 79 178
Yes 183 197 380
Table 1 shows how job match (whether a participant’s job matches their qualifications)
relates to gender. Out of 558 participants, 380 (68.1%) reported that their job matched their
qualifications. Of those who reported a job match are 48.2% females.
23
Looking more closely at gender, out of 282 female participants, 183 (64.9%) reported a
job match, while 99 (35.1%) did not. Among the 276 male participants, 197 (71.4%) reported
a job match, while 79 (28.6%) reported a mismatch. Females are more likely to report a job
mismatch than males. This suggests that although many participants feel their jobs align with
their qualifications, a significant portion, particularly among females, does not.
Age
Job Match Total
18-27 28-32 33-37 38-42 43 and above
No 27 70 49 25 7 178
Yes 18 86 160 82 34 380
Table 2 displays the cross-tabulation between the Job_Match variable and Age groups. The
results show how participant’s age relates to whether their job matches their qualifications.
Out of the 558 participants, 380 (68.1%) reported a job match, while 178 (31.9%) reported
a job mismatch. Among the youngest age group, 18-27, 60.0% reported a job mismatch,
while 40.0% reported a job match. For the 28-32 age group, 55.1% reported a job match,
while 44.9% reported a mismatch. The 33-37 age group showed the highest percentage of job
matches, with 76.6% reporting a match and only 23.4% reporting a mismatch. Participants
aged 38-42 had a similar trend, with 76.6% reporting a job match and 23.4% reporting a
mismatch. Finally, in the 43 and above age group, 82.9% reported a job match and only
17.1% reported a mismatch.
This data suggests that job match improves with age, with older participants being more
likely to find jobs that align with their qualifications.
Table 3 presents the cross-tabulation between Job_Match and the year of arrival in Sweden.
This analysis reveals the relationship between the duration of stay in Sweden and job match
24
Table 3: Cross-tabulation of Job_Match and Arrival_Year
Arrival Year
Job Match Total
2018 and before 2019 2020 2021 2022 2023-2024
No 24 6 9 21 39 79 178
Yes 112 36 27 62 72 71 380
outcomes. Participants who arrived in 2018 or before showed a high percentage 82.4% of job
matches, with only 17.6% reporting a mismatch. For those arriving in 2019, 85.7% reported a
job match, while 14.3% reported a mismatch. However, participants who arrived in 2020 had
a lower job match percentage 75.0%, and 25.0% reported a mismatch. Those arriving in 2021
also had a higher job mismatch 25.3% compared to those who reported a job match 74.7%.
For those arriving in 2022, 64.9% reported a job match, with 35.1% reporting a mismatch.
Finally, the participants who arrived between 2023-2024 showed 47.3% reporting a job match
and 52.7% reporting a mismatch.
This data indicates that participants who have been in Sweden longer tend to have a higher
likelihood of finding jobs that match their qualifications.
County Sweden
Job Match Total
Other Skane Stockholm Uppsala Vasterbotten Vastra Gotaland
No 51 20 17 22 18 50 178
Yes 63 67 53 32 28 137 380
Table 4 shows the cross-tabulation between Job_Match and the county in Sweden where
participants reside. Participants residing in the "Other" counties category showed a job match
percentage of 55.3%, while 44.7% reported a mismatch. In Skane, 77.0% reported a job
match, with 23.0% reporting a mismatch. In Stockholm, 75.7% of participants reported a job
25
match, and 24.3% reported a mismatch. Uppsala participants had 59.3% reporting a job match
and 40.7% a mismatch. For Vasterbotten, 60.9% reported a job match, while 39.1% reported
a mismatch. Lastly, in Vastra Gotaland, 73.3% reported a job match, with 26.7% reporting a
mismatch.
These findings show that for this sample job match likelihood varies by county, with Skane
and Stockholm showing higher alignment with job qualifications compared to other counties.
Education Level
Job Match Total
Bachelors degree Higher secondary and below Masters degree PhD and above
No 71 4 101 2 178
Yes 167 5 176 32 380
Table 5 provides the cross-tabulation between Job_Match and the education level of par-
ticipants. Among participants with a Bachelor’s degree, 70.2% reported a job match, while
29.8% reported a mismatch. Those with higher secondary education or below had the lowest
job match percentage 55.6% and 44.4% reported a mismatch. Participants with a Master’s
degree had a high job match percentage 63.5%, and 36.5% reported a mismatch. For those
with a PhD or above, 94.1% reported a job match, while only 5.9% reported a mismatch.
The data shows that higher education levels, particularly a PhD, are associated with a higher
likelihood of finding a job that matches one’s qualifications.
Table 6 presents the cross-tabulation between Job_Match and participants’ educational back-
grounds. Among participants with a Business, Management, or Marketing background, 45.7%
reported a job match, while 54.3% reported a mismatch. In Electrical Engineering, 77.9%
reported a job match, and 22.1% reported a mismatch. Information Technology participants
had the highest job match percentage 79.8%, with only 20.2% reporting a mismatch. Partic-
ipants with a Mechanical Engineering background reported a 68.2% job match, while 31.8%
26
Table 6: Cross-tabulation of Job_Match and Educational_Background
Educational Background
Job Match Total
Busi- Electrical Information Mechanical Medi- Other Other
ness Engi- Technol- Engineer- cal Edu- Engi-
neering ogy ing cation neering
No 19 17 35 21 20 34 32 178
Yes 16 60 138 45 22 48 51 380
reported a mismatch. For those in the Medical and Health Sciences field, 52.4% reported a
job match, while 47.6% reported a mismatch. In Other Education fields, 58.5% reported a job
match, with 41.5% reporting a mismatch. Lastly, participants from Other Engineering fields
had a 61.4% job match percentage, with 38.6% reporting a mismatch.
This data shows significant variation in job match across different educational backgrounds,
with Information Technology and Electrical Engineering fields exhibiting the highest job match
percentages.
Has Job
Job Match Total
Yes, full-time Yes, part-time
No 58 120 178
Yes 356 24 380
Table 7 shows the cross-tabulation between Job_Match and whether participants currently
have a job (either full-time or part-time). Among those who reported having a full-time job,
86.0% reported a job match, while 14.0% reported a mismatch. Participants with part-time
jobs were more likely to report a job mismatch, with 83.3% reporting a mismatch and only
27
16.7% reporting a job match.
These results show that for this sample full-time employment is more strongly associated
with finding a job that matches one’s qualifications, while part-time employment is linked to a
higher rate of job mismatch.
In the following, the factors that are most important in describing job matching are examined
using Lasso logistic regression5 (Lasso in a Generalized Linear Model (GLM), Friedman et al.
(2010)). Here, the response variable is binary, and the independent variables are factors. Lasso
in a GLM is a better choice, as it is designed to handle binary outcomes and factor variables
appropriately.
To determine the size of the penalty function, denoted λ in the Lasso regression, cross-
validation was employed.6 An estimate of λ = 0.0128 was obtained, which indicates that
Lasso regression is not strongly penalizing the coefficients, suggesting a relatively unrestricted
model. However, this can lead to a better fit to the training data; additionally, cross-validation
helps to reduce overfitting.
Explanation of the LASSO Coefficient Paths Figure 1 shows the coefficient paths for var-
ious predictors as the penalty parameter λ changes. In Lasso regression, as λ increases, the
model increasingly penalizes larger coefficients, driving many of them to zero. This results in
the elimination of less important predictors, making the model more interpretable and reducing
the risk of overfitting. The figure visually represents this process, showing how coefficients are
shrunk to zero as λ increases. The coefficients that remain non-zero as λ grows are those that
have the strongest association with the outcome variable (Agresti, 2015).
5.4.1 Results
The coefficients from the final Lasso logistic regression model indicate which factors are most
important in predicting whether someone obtains a job that matches their qualifications. As
5
Alternative approaches that were considered included using information criteria, e.g., the Akaike Information
Criterion (AIC) and Bayesian Information Criterion (BIC), or Ridge regression.
6
Lasso logistic regression effectively shrinks some coefficients to zero, performing automatic variable selec-
tion. This helps identify the most relevant predictors while avoiding overfitting [(Agresti, 2013) and (Agresti,
2015). The R code used for model selection is provided in Appendix B.
28
the levels of the factors have been included as separate binary variables in the Lasso model,
they will be denoted as variables or categories in the discussion below. The variables identi-
fied as important in this model are those with non-zero coefficients, while variables with zero
coefficients are considered unimportant.
From each coefficient, it can be determined how a specific category affects the chances of
obtaining a job match. A positive coefficient indicates that the category increases the chances
of a job match compared to the base-level category, whereas a negative coefficient indicates
that the category decreases the chances of a job match compared to the base level. These
coefficients assist in understanding which categories make it more or less likely for a migrant
to find a job that fits their qualifications.
Base Level Variables Table 12 lists the base-level categories in the model. These base levels
are the reference categories against which other categories of the same variable are compared.
For example, the base level for Gender is Female, so all other gender-related coefficients in
the model are compared to females. Selecting the base level is important in category variable
regression because the analysis is comparing with base-level categories. In this model, the most
frequent category was chosen as the base level, and all other categories were compared to them.
The base-level variables are given below:
• Gender: Female
• Age: 33-37
Positive Coefficients Table 8 lists the predictors with positive coefficients from the Lasso
logistic regression model. These variables increase the chances of getting a job that matches
your qualifications.
29
Table 8: Positive Coefficients in LASSO
• Intercept: The intercept coefficient of 2.0384 represents the starting point, or baseline
log-odds, of finding a job match when all other factors are at their default categories (base
level categories).
• Gender (Male): A positive coefficient of 0.1806 means that being male slightly increases
the likelihood of getting a job that matches your qualifications compared to being female.
• County Sweden (Stockholm County): A positive coefficient of 0.2271 shows that living
in Stockholm County increases the chances of getting a job that matches their qualifica-
tions compared to the County Vastra Gotaland.
• Education Level (PhD and above): The highest positive coefficient, 1.3629, indicates
that having a PhD or a higher degree greatly increases the likelihood of finding a job that
matches your qualifications compared to the Master’s degree.
Negative Coefficients Table 9 lists the predictors with negative coefficients from the Lasso
logistic regression model. These variables decrease the likelihood of a job match.
• Age (18-27): A negative coefficient of -0.01485 indicates that younger individuals (aged
18-27) are less likely to experience a job match.
• Age (28-32): The small negative coefficient of -0.18515 shows a slight decrease in job
match likelihood for this age group.
• Arrival Year (2022): A negative coefficient of -0.04750 suggests that those who arrived
in 2022 are less likely to find a job match.
30
Table 9: Negative Coefficients in LASSO
• Has Job (Yes, part-time): The most substantial negative coefficient, -3.36002, suggests
that part-time employment is strongly associated with a lower likelihood of a job match.
Unimportant Variables Table 13 lists the variables that were shrunk to zero in the Lasso
logistic regression model, indicating that they do not significantly contribute to predicting job
31
match. These variables are Age (38-42, 43 and above), Arrival Year (2018 and before, 2019,
2020, 2021), County Sweden (Skane County, Uppsala County, Other County), Education Level
(Bachelor’s degree, Higher secondary and below), and Educational Background (Electrical
Engineering, Mechanical Engineering). The exclusion of these variables indicates that they do
not play a significant role in predicting whether a participant’s job matches their qualifications.
The performance of the Lasso logistic regression model was evaluated using several key met-
rics: accuracy, precision, recall, and F1-score. These metrics help in understanding how well
the model is able to correctly predict job matches and how it balances between false positives
and false negatives.
Table 10: Confusion Matrix: Cross-tabulation of Predicted vs. Actual Job Match
No 120 23 143
Yes 58 357 415
Confusion Matrix The confusion matrix (Table 10) shows the cross-tabulation of predicted
versus actual job matches. It is a crucial tool for visualizing the performance of a classification
model. True Positives (TP); The model correctly predicted 357 actual job matches as matches.
True Negatives (TN); The model correctly identified 120 non-matches as non-matches. False
Positives (FP); The model incorrectly predicted 58 non-matches as matches. False Negatives
(FN); The model incorrectly identified 23 actual matches as non-matches.
32
Table 11: Performance Metrics for Lasso Logistic Regression Model
Metric Value
Accuracy 0.8548
Precision 0.8602
Recall (Sensitivity) 0.9395
F1 Score 0.8981
the model were indeed correct. Recall (Sensitivity or True Positive Rate) shows how many
of the actual job matches were correctly identified by the model. A recall of 93.95% indicates
that the model correctly identified a large proportion of the actual job matches. F1 Score is the
harmonic mean of precision and recall, providing a balance between the two. The F1 score of
89.81% reflects that the model is both accurate and reliable in predicting job matches.
6 Discussion
6.1 Results
The starting point of this thesis was the experience of moving from Kerala, a state in India, to
Sweden and the numerous questions regarding migration from Kerala to Sweden. The problem
in answering many of these questions stemmed from a lack of data, as no data exists on people
from Kerala in Sweden. To address this issue, a survey was designed, and data was collected
from individuals of Kerala origin, enabling the answers to some of these questions.
The results from the survey indicate that immigration from Kerala to Sweden began be-
fore 2000 and includes individuals of all ages (including pensioners), living in various types of
housing, with apartments being the most common. Furthermore, the survey reveals that Vas-
tra Gotaland County has the highest population of Malayalis in Sweden, and there are more
females than males within the Malayali community. Most respondents expressed satisfaction
with their lives in Sweden, although not all did. Issues were raised regarding low-paying jobs
and dissatisfaction with personal, family, and overall life after moving to Sweden. All districts
in Kerala are represented in Sweden, although responses were not received from some counties.
Most counties in Sweden have Malayali representation. Interestingly, a few individuals did not
encounter any challenges during their first year. In terms of housing, while apartments are the
33
most common choice, houses are also quite popular.
The Lasso logistic regression model proved effective in predicting job-qualification align-
ment with notable accuracy and precision. From this data analysis, it was found that higher
educational qualifications significantly improve the chances of finding a job that matches one’s
skills. Although Sweden is recognized as one of the best countries for gender equality, the
survey indicates that males still have a higher chance of securing a qualified job than females.
Stockholm County, the capital city of Sweden, appears to be the most promising location for
finding a qualified job.
6.2 Implications
Migrants coming to Sweden may potentially improve their job prospects by getting higher
education, such as Master’s or PhD degrees. The timing of when you move is likely to matter;
for example, those who arrived in 2022 seem to have jobs that do not fit their skills than those
who moved more recently. It might be helpful to think about when you move, as changes in
the job market and migration rules can affect job chances. For policymakers, it’s important to
offer support that fits the needs of different groups of migrants based on when and where they
arrived. Also, some migrants are facing lower than regulated salaries so it’s important to make
sure workers know their rights. Many migrants are also unhappy with their personal and family
life after moving, so policies that help with work-life balance and overall well-being would
be useful. By improving education opportunities and providing targeted support, along with
addressing job quality and satisfaction, both migrants and policymakers can work together to
achieve better job matches and a better quality of life in Sweden.
The main issue with the study was the lack of a sampling frame for people from Kerala, which
made it impossible to use a probability sampling method. As a result, a combination of non-
probability sampling methods, such as judgmental sampling and snowball sampling, was used.
Both judgmental sampling and snowball sampling suffer from selection bias, limited general-
izability, and difficulty in reaching diverse populations (Atkinson and Flint (2001)). To reduce
selection bias, efforts were made to include all counties in Sweden and reach different age
groups and migrants from various years through judgmental sampling. However, the sample
34
may not represent all Kerala migrants in Sweden, so caution should be exercised when gen-
eralizing the results to the whole population. A further concern is that the data only captures
a snapshot at a single point in time, which limits the understanding of long-term integration
experiences.
Future research should aim to use a larger, more varied sample and include methods that
track changes over time to provide a clearer picture of migrant experiences. Since detailed
information about the overall population is lacking, a proper sampling frame could not be
established. The implication is that the results from the analysis may be difficult to generalize.
Furthermore, some problems with missing information were encountered.
Future research should also consider comparing the experiences of migrants from other
regions to identify broader patterns and challenges. Longitudinal studies that track changes
over time will provide a deeper understanding of how job alignment and integration evolve.
Given the current limitations in population data, future studies should aim to include a larger
and more diverse sample to better understand the experiences of Kerala migrants in Sweden.
35
References
Agresti, A. (2013). Categorical Data Analysis. John Wiley & Sons, 3rd edition.
Agresti, A. (2015). Foundations of Linear and Generalized Linear Models. John Wiley &
Sons.
Atkinson, R. and Flint, J. (2001). Accessing hidden and hard-to-reach populations: Snowball
research strategies. Social Research Update, 33:1–4.
Bratsberg, B., Ragan, J. F., and Nasir, Z. M. (2002). Foreign-born workers in the us labor
market. Journal of Economic Literature, 40(1):105–138.
Cassarino, J.-P. (2004). Theorising return migration: The conceptual approach to return mi-
grants revisited. International Journal on Multicultural Societies, 6(2):253–279.
Chiswick, B. R. and Miller, P. W. (2003). The skills of immigrants in the us: Education and
gender. Research in Labor Economics, 22:229–255.
Evans, J. R. and Mathur, A. (2009). Online surveys and migrant populations. Journal of
Migrant Studies, 15(2):123–145.
Friedman, J., Hastie, T., and Tibshirani, R. (2010). Regularization paths for generalized linear
models via coordinate descent. Journal of Statistical Software, 33(1):1–22.
Groves, R. M., Fowler Jr, F. J., Couper, M. P., Lepkowski, J. M., Singer, E., and Tourangeau,
R. (2011). Survey Methodology. Wiley.
Larsson, C. (2024). Indian high-skilled labor migrants in sweden: A study about social inte-
gration, interpersonal communication, and national identification.
Lundborg, P. and Skedinger, P. (2013). Ethnic enclaves and the economic success of immi-
grantsâevidence from sweden. The Scandinavian Journal of Economics, 115(3):905–929.
Martin, P. (2009). Recession, Migrants, and the Welfare State. University of California Press.
36
Perla, R. J. and Provost, L. P. (2012). Judgment sampling: A health care improvement perspec-
tive. Quality Management in Health Care, 21(3):169–175.
Piracha, M. and Vadean, F. (2013). Immigrant overeducation: A literature review and gaps.
Journal of Economic Surveys, 27(4):963–988.
Zhang, Z. (2016). Multiple imputation with multivariate imputation by chained equation (mice)
package. Annals of Translational Medicine, 4(2):30.
37
Appendix
Gender (Female)
Age (33-37)
County Sweden (Vastra Gotaland County)
Education Level (Masters degree)
Educational Background (Information Technology)
Has Job (Yes, full-time)
Arrival Year (2023-2024)
Variable
Age (38-42)
Age (43 and above)
Arrival Year (2018 and before)
Arrival Year (2019)
Arrival Year (2020)
Arrival Year (2021)
County Sweden (Skane County)
County Sweden (Uppsala County)
County Sweden (Other County)
Education Level (Bachelors degree)
Education Level (Higher secondary and below)
Educational Background (Electrical Engineering)
Educational Background (Mechanical Engineering)
38
B Model Implementation
R Code The following R code was used to implement the Lasso logistic regression model:
# P e r f o r m LASSO l o g i s t i c r e g r e s s i o n w i t h c r o s s − v a l i d a t i o n
l a s s o _ model <− cv . g l m n e t ( x , y , a l p h a = 1 , f a m i l y = " b i n o m i a l " )
b e s t _ lambda <− l a s s o _ model $ lambda . min
# F i t t h e f i n a l model u s i n g t h e b e s t lambda v a l u e
f i n a l _ model <− g l m n e t ( x , y , a l p h a = 1 , f a m i l y = " b i n o m i a l " ,
lambda = b e s t _ lambda )
In this code, [Link] is used to perform cross-validation and to find the optimal
lambda value ([Link]) that minimizes the cross-validation error. The final model is
then fitted using this best lambda value, that is 0.0128.
TP + TN 357 + 120
Accuracy = = = 0.8548 (85.48%)
TP + TN + FP + FN 558
TP 357
Precision = = = 0.8602 (86.02%)
TP + FP 357 + 58
TP 357
Recall (Sensitivity or True Positive Rate) = = = 0.9395 (93.95%)
TP + FN 357 + 23
39
D Multicollinearity
Lasso binary logistic regression was selected as the primary modelling technique for predicting
job matches. This method is particularly advantageous when dealing with datasets that have
multicollinearity among predictors and category-level feature selection.
Table 14 shows the Variance Inflation Factors (VIF) for the independent variables. High
GVIF values (typically > 10) or high GVIF1/(2·Df) values (typically > 2.5) suggest multi-
collinearity issues. Since all factors are below these thresholds, there seem to be very limited
reasons to be concerned with multicollinearity.
40
E LASSO Coefficient Paths for Various Predictors
1: GenderMale
2: Age18−27
18
3: Age28−32
4: Age38−42
5: Age43 and above
6: Arrival_Year2018 and before
7: Arrival_Year2019
8: Arrival_Year2020
2
9: Arrival_Year2021
10: Arrival_Year2022
11: County_SwedenOther county
12: County_Swedenskåne county
13: County_Swedenstockholm county
14: County_Swedenuppsala county
15: County_Swedenvästerbotten county
16: Education_LevelBachelor’s degree
17: Education_LevelHigher secondary and below
18: Education_LevelPhD and above
1
14
0
12
10
2
5
11
4
Coefficients
16
17
3
20
21
−1
23
24
15
22
−2
19
−3
25
−4
−8 −7 −6 −5 −4 −3 −2 −1
Log(Lambda)
The plot shows how the coefficients of different predictors evolve as the regularization param-
eter, Lambda, changes.
41
F Survey Questions and Corresponding Bar Plots
Figure 3: 2. Age
Bar Plot of Age
255
(35.6%)
418
(58.4%)
400
Count
120
292
(16.8%)
(40.8%)
300
100
Count
67
200 (9.4%)
38
(5.3%)
100
6 4
1 1 1
(0.8%) (0.6%)
(0.1%) (0.1%) (0.1%)
1 5
(0.7%)
0
(0.1%)
0
2
A
−2
−2
−3
−3
−4
−4
−5
−5
−6
N
er
A
al
al
18
23
28
33
38
43
48
53
63
N
th
m
O
Fe
Gender Age
150
Bar Plot of Move_Reason
332
(46.4%)
Count
300
95
100 (13.3%)
238
233
(33.2%)
(32.5%)
210
(29.3%)
200
Count
53
151
(21.1%) (7.4%)
43 42
50 35 (6%) (5.9%)
33
(4.9%) 27 (4.6%)
100 (3.8%)
19
62 15
(8.7%) (2.7%)
(2.1%)
3 2 3
8 (0.4%) (0.3%) (0.4%)
(1.1%)
0
0
b
th
sa
16
17
18
19
20
21
22
23
24
A
p
Jo
io
io
00
01
01
w
vi
N
ro
t
un
20
20
20
20
20
20
20
20
20
20
ro
ca
Eu
en
−2
−2
−2
rg
re
du
e
ng
in
ily
ee
or
01
06
11
re
he
tle
m
ar
he
ef
20
20
20
Sc
Fa
C
t
se
ig
B
H
To
Move_Reason Arrival_Year
42
Figure 7: 6. What is your current visa status?
Bar Plot of Visa_Status
203
200
(28.4%)
(27.9%)
Figure 6: 5. Did you move to Sweden 200
Count
(75.8%)
100 86
(12%)
76
(10.6%)
400
Count
50
24
(3.4%)
172
200
(24%) 5
(0.7%)
ce
po p /
sa
A
en
en
e
N
ek
rt
en
vi
ss hi
1
ud
d
se
pa ns
en
b
id
(0.1%)
St
Jo
s
h e
ep
b
re
is itiz
0
Jo
D
nt
ed C
e
an
ne
ily
rm
N
m
lo
Sw
fa
A
Pe
ith
W
Move_With_Family Visa_Status
400
100
Count
205
(28.6%)
200
7
(1%)
0
y
be ily
ny
8
nc
N
r
em m
pa
ge
(1.1%)
M Fa
om
A
C
lf
0
se
ur
Yo
A
Ye
Visa_Application Visa_Interview
43
Count
En
0
100
200
vi
ro
n
St me
ud nta
2
ie l
To
u s
H ris
os m
So pita an
4
ci lit d
al y
Sc
H ie
um nc
es
4
an
M Re
an s
ag ou
A em rc Count
rc
4
hi en es
0
100
200
300
400
te t
ct
ur
D a e
Ve A es n
4
te gr ig d
rin ic n Se
ar ult
y ur c on
Sc e da
ie an
2
C n ry
4
om
(0.3%)
ce d
M m M
5
at
ic
(0.7%)
St
ig
Bar Plot of Education_Level
at he
is
tic rs
s
5
Le ec
ga on
(0.7%)
da
A lS
8
rt t ry
s ud
(1.1%)
an ie
s
5
d
H
(0.7%)
um
an
iti
es
7
Ed
uc Ph
a D
Pe tio an
da n a d
go nd ab
8
D gy ov
44
at e
36
a
(5%)
Sc
ie
nc
e
9
Fi
Education_Level
(2.8%)
E m ni de
O ng uni cs gr
th in ca a ee
Educational_Background
er e ti nd
301
(42%)
Eneri on
20 (3.6%)
gi ng
ne
N er
in
at
g
34
ur
al
Sc
ie M
C nc as
iv es te
il r’s
35
En de
gi gr
ne ee
369
er
in
(51.5%)
36
26 (4.7%) (4.9%) (5%)
M B
an u
M M ag sin
ed a em e
Figure 11: 10. What is your educational background?
ic rk e ss
Figure 10: 9. What is your highest level of education?
48
al et nt ,
an ing ,
(6.7%)
Sc d H
ie ea
nc lth
59
es
(8.2%)
En Mec
gi ha
ne n
er ica
in l
71
g
(9.9%)
En E
gi ec l
ne tr
er ica
in l
94
In g
Te fo
(13.1%)
ch rm
no ati
lo on
gy
232
(32.4%)
Figure 13: 12. Does your current job match
Figure 12: 11. Do you have a job? your qualifications?
Bar Plot of Has_Job Bar Plot of Job_Match
413 380
400
(57.7%) (53.1%)
400
300
300
Count
Count
200
200 156
(21.8%)
144
(20.1%)
106
108 (14.8%)
(15.1%)
100
67
100
(9.4%)
48
(6.7%)
5 2
1 2 (0.7%) (0.3%)
(0.1%) (0.3%)
0
0
lif am
lif am
le
A
ch
N
ab
t
in b
ch
at
is
en
tim
tim
ch jo
ie
ie
ua , I
ua , I
ic
g
m
on
at
ud
rq No
rq No
ar m
pl
t−
l−
tm
't
si
se I a
st
ap
sn
ul
ar
n
,i
,f
Pe
,p
o,
oe
ot
s
de
e
s
m
Ye
N
ov
d
Ye
Ia
Ye
un
it
o,
o,
N
N
Has_Job Job_Match
500
458 201
199
(64%) (28.1%)
(27.8%)
200
400
156
(21.8%)
150
300
Count
109
Count
(15.2%)
100
200
156
(21.8%)
100 50 38
52 (5.3%)
38 (7.3%)
(5.3%)
11
1 11
(1.5%)
(0.1%) (1.5%) 2
0 (0.3%)
0
re
le
A
ns t
tio u
Ye
N
N
ab
su
la bo
ic
gu a
ot
pl
le
A
re ow
ra
N
fie
fie
fie
fie
N
ap
ab
t
h n
eu
is
is
tis
tis
ic
is 't k
ot
at
pl
sa
Sa
sa
N
ss
ed on
ap
is
ry
di
Sw I d
ot
Ve
y
N
r
Ve
Salary_Compliance Job_Satisfaction
45
Figure 16: 15. What do you think are the main problems in getting a job in Sweden?
568
600
(79.3%)
400
Count
276
(38.5%)
124
(17.3%)
94
(13.1%)
13 16
(1.8%) (2.2%)
0
r
or al
su sa
nc nt
te h
tio ign n
A
ie
er
ce
in jo
da it
re io
N
tw n
rie va
k
fie my
es
s
is r vi
rr
di w
ne sio
th
ifi fo it
en
es ed
ba
pe ele
al of gn
ld
ns
an ion
O
o
er
es
iti it
o
ex f r
ge
it
un Lim
iff
l c tit
c
of
rm
re
a
ua
ld
ca pe
pr
c
ck
pe
of
ng
ra
lo om
of
La
tu
k
ck
La
rt
qu
C
or
ck
ul
po
La
W
La
C
op
Job_Problems
Figure 17: 16. Are you satisfied with medical care in Sweden?
Bar Plot of Medical_Care_Satisfaction
300 279
(39%)
200 177
(24.7%)
Count
143
(20%)
100
51
44 (7.1%)
(6.1%)
22
(3.1%)
0
d
al
A
fie
fie
fie
fie
N
tr
eu
tis
is
tis
tis
at
N
sa
sa
Sa
ss
is
y
di
D
r
Ve
yr
Ve
Medical_Care_Satisfaction
46
Figure 18: 17. Have you experienced any health
issues after moving to Sweden?
Figure 19: 18. How much do you care
Bar Plot of Health_Issues
310
about your mental and physical health
(43.3%)
300
268
(37.4%)
after moving to Sweden?
Bar Plot of Health_Care
247
(34.5%)
200
Count
197
(27.5%)
200
115 172
(24%)
(16.1%)
81
100
(11.3%)
Count
42
(5.9%)
100 81
19
15 (11.3%)
(2.7%)
(2.1%) 3
(0.4%)
0
17
es
Pa Pa th
es
A
(2.4%)
er
bl
ie
io
N
l
in rt
ci
su
or a
rg
a
th
ss
2
y He
en
ic
is
lle
O
re
pl
(0.3%)
i
od n
ic
e
A
ep
ap
/B mo
ar
ef
0
lc
D
ot
es m
ta
N
in
su Co
en
m
ta
A
al
ha
lo
Vi
uc
uc
N
at
Is
ew
a
m
te
ot
m
ot
ui
r
N
So
N
Ve
Q
Health_Issues Health_Care
Figure 20: 19. How satisfied are you with Figure 21: 20. How satisfied are you with the
Swedish food culture compared to Kerala amount of personal or family time you have
food culture? after moving to Sweden?
Bar Plot of Food_Culture_Satisfaction Bar Plot of Family_Time_Satisfaction
327 312
(45.7%) (43.6%)
300
300
248
(34.6%)
200
200
169
Count
Count
(23.6%)
104
104
(14.5%)
(14.5%)
100
100
60
47 (8.4%)
(6.6%)
27
15 (3.8%)
9 8
(2.1%) 2
(1.3%) (1.1%)
(0.3%)
0 0
d
le
A
ra
ra
fie
fie
fie
fie
fie
fie
fie
fie
N
N
ab
t
t
eu
eu
is
tis
is
tis
is
is
tis
tis
ic
at
at
t
N
N
pl
sa
sa
Sa
sa
Sa
sa
ss
ss
ap
is
is
ry
ry
di
di
D
D
ot
Ve
Ve
y
y
N
r
r
Ve
Ve
Food_Culture_Satisfaction Family_Time_Satisfaction
47
Figure 22: 21. How satisfied are you with your life after
moving to Sweden?
Bar Plot of Life_Satisfaction
336
(46.9%)
300
190
200 (26.5%)
Count
142
(19.8%)
100
37
(5.2%)
7 4
(1%) (0.6%)
0
d
al
A
fie
fie
fie
fie
N
tr
eu
is
tis
tis
tis
at
N
sa
sa
Sa
ss
is
y
di
r
Ve
yr
Ve
Life_Satisfaction
147
(20.5%)
150
100
87
(12.2%)
Count
72
70
(10.1%)
(9.8%)
55
51 (7.7%)
45 (7.1%)
42
50 (6.3%)
37 (5.9%)
35
(5.2%)
29 (4.9%)
(4.1%)
16
13
10 (2.2%)
7 (1.8%)
(1.4%)
(1%)
0
ad
od
ki
tta
ad
m
r
am
A
nu
su
od
N
uk
ra
la
ya
an
kk
uz
ag
ur
ul
th
ris
an
l
ik
pu
ta
Id
Ko
la
pp
ay
k
ar
ap
zh
Th
t
na
ap
Pa
Ko
as
na
W
la
th
Ko
Er
al
A
K
ha
an
M
n
Pa
va
iru
Th
District_Kerala
48
Figure 24: 23. Which county are you currently residing in Sweden?
243
(33.9%)
200
Count
109
98 (15.2%)
(13.7%)
100
74
(10.3%)
49
(6.8%)
36
26 (5%)
23
14 15 (3.2%) (3.6%)
9 (2.1%) 6
4 5 (2%)
1 2 2 (1.3%)
(0.6%) (0.7%) (0.8%)
(0.1%) (0.3%) (0.3%)
0
y
bo un nd
ty
ty
ty
nt d
A
nt
nt
nt
nt
nt
nt
nt
nt
nt
nt
nt
ou n
N
un
un
un
rg ty
y
C rla
C ala
ou
ou
ou
ou
ou
ou
ou
ou
ou
ou
ou
o
o
or
öt
C
C
o
rn
G
o
nd
na
rg
nd
nd
nd
la
e
ån
br
an
in
tte
te
sa
ol
ra
be
ar
la
tla
la
la
öp
re
Sk
l
kh
bo
st
pp
an
an
ot
al
al
no
Vä
le
gö
Ö
nk
Vä
H
oc
D
G
äv
er
U
m
rm
ro
er
Jö
st
St
st
G
K
de
st
Vä
Vä
Ö
Sö
County_Sweden
400
363
(50.7%)
300
209
(29.2%)
Count
200
82
100
(11.5%)
39
(5.4%)
17
6 (2.4%)
(0.8%)
0
sy
sy
rd
A
ar
N
iu
ha
ea
Ea
ed
y
y
M
r
r
Ve
Ve
Cultural_Integration
49
Figure 26: 25. Have you faced any
Figure 27: 26. What type of accommodation do you
challenges after moving to Sweden? If
have?
yes, how often have you faced these
Bar Plot of Accommodation_Type
came?
400
319
(44.6%)
300
300
Count
200
200
108
Count
159
(22.2%) (15.1%)
131 100
(18.3%)
39 40
30
19 (5.4%) (5.6%)
15 (4.2%)
100 (2.7%)
(2.1%) 1
(0.1%)
51
45
(7.1%) 0
(6.3%)
11
tm o g
en g
om g
en g
se
A
en
ar ro in
tm in
ro in
tm in
in
N
en m
ou
(1.5%)
ap ore ous
ar us
or us
ar us
liv
m
t
rt
ap ho
rid ho
ap ho
ed
pa
m th
ar
m nt
or t
io nt
0
A
or n
C den
Sh
o ude
oo de
ud de
−r tu
St tu
Tw St
St
ne S
S
r
es
A
e
el
te
fte
N
ev
im
ar
of
O
N
et
R
y
m
r
O
Ve
So
Challenges_Faced Accommodation_Type
150
135 135
yes, how satisfied are you with parental
(18.9%) (18.9%)
122
(17%) benefits in Sweden?
110
(15.4%)
100
251
(35.1%)
71
Count
69
(9.9%)
(9.6%)
188
200 184
(26.3%)
(25.7%)
50
Count
28
25
(3.9%)
(3.5%)
100
13
8 (1.8%) 65
(9.1%)
(1.1%)
0 19
(2.7%)
7
2
(1%)
(0.3%)
K
A
en
SE
SE
SE
SE
SE
SE
SE
SE
N
R
0
00
00
00
00
00
00
0
ng
00
50
,0
,0
,5
,0
,5
yi
5,
0,
7,
20
20
17
15
12
Pa
−
an
al
le
A
0−
0−
0−
0−
0−
00
an
fie
fie
fie
fie
N
ab
tr
ot
th
eu
50
00
50
00
50
tis
is
tis
tis
lic
th
5,
N
a
sa
Sa
sa
ss
p
,
7,
ss
17
15
12
10
ap
e
is
y
or
Le
di
ot
Ve
y
M
N
r
Ve
Rent_Range Parental_Status
50
Figure 30: 29. How much knowledge did you have about Sweden before moving?
Bar Plot of Sweden_Knowledge
328
(45.8%)
300
239
(33.4%)
200
Count
106
(14.8%)
100
40
(5.6%)
3
(0.4%)
0
e
al t
bl y
bl y
A
a
ea htl
ea tel
bl
N
l
e
e
ea
dg ra
bl
dg ig
dg
ea
le Sl
le de
dg
le
ow Mo
ow
le
ow
ow
kn
kn
kn
kn
yr
Ve
ot
N
Sweden_Knowledge
County_Sweden
Arrival_Year
Job_Match
Has_Job
Gender
Age
545 0
4 1
4 1
2 1
2 1
1 3
0 0 0 0 3 3 4 5 15
51
Table 15: Table with Highlighted Imputed Values in the Model Data
52
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
3
3
1
1
1
3
3
1
5
1
2
3
1
1
1
1
1
1
1
3
1
1
3
8
1
1
1
9
1
1
1
4
1
1
2
13
3
14
1
10
25
559
0
Education_Level
0
Educational_Background
1
Move_With_Family
1
Salary_Compliance
1
Accommodation_Type
2
Has_Job
2
Job_Match
2
Job_Satisfaction
2
Health_Care
2
Family_Time_Satisfaction
3
Arrival_Year
3
Sweden_Knowledge
4
Age
4
Life_Satisfaction
5
Gender
53
5
Visa_Status
6
County_Sweden
7
Visa_Application
8
Move_Reason
8
Visa_Interview
9
Food_Culture_Satisfaction
Challenges_Faced
Figure 32: Missing Data Pattern
Job_Problems
District_Kerala
Cultural_Integration
Parental_Status
Medical_Care_Satisfaction
Rent_Range
Health_Issues
3
4
4
6
4
6
4
5
4
2
3
5
1
3
1
2
1
4
1
1
4
4
2
1
1
2
1
3
2
1
2
2
2
2
3
2
2
1
2
3
2
1
2
3
2
1
3
2
2
1
2
2
2
1
2
1
2
1
1
0
11 16 16 17 19 22 25 42 243