0% found this document useful (0 votes)
46 views7 pages

SweVoc A Swedish Vocabulary Resource For

SweVoc is a Swedish vocabulary resource aimed at enhancing vocabulary knowledge for language teaching and learning, particularly in Computer-Assisted Language Learning (CALL) applications. It categorizes vocabulary items based on usage and frequency, integrating various sources to create a comprehensive word list of approximately 8,500 entries. The resource addresses the lack of a structured base vocabulary for Swedish, facilitating better communication and understanding in both educational and assistive contexts.

Uploaded by

venuspham9989
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views7 pages

SweVoc A Swedish Vocabulary Resource For

SweVoc is a Swedish vocabulary resource aimed at enhancing vocabulary knowledge for language teaching and learning, particularly in Computer-Assisted Language Learning (CALL) applications. It categorizes vocabulary items based on usage and frequency, integrating various sources to create a comprehensive word list of approximately 8,500 entries. The resource addresses the lack of a structured base vocabulary for Swedish, facilitating better communication and understanding in both educational and assistive contexts.

Uploaded by

venuspham9989
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

SweVoc - A Swedish vocabulary resource for CALL

Katarina Heimann Mühlenbock and Sofie Johansson Kokkinakis

Dept of Swedish, University of Gothenburg, Gothenburg, Sweden


[email protected], [email protected]

Abstract 1 Background

The core in language teaching and learning is Vocabulary knowledge plays a central role in a per-
vocabulary, and access to a delimited set of son’s ability to communicate, as well as reading and
words for basic communication is central for understanding written text. It is therefore a central
most CALL applications. Vocabulary char- issue in many readability assessment approaches.
acteristics also play a fundamental role for Prominent researchers within readability and lan-
matching texts to specific readers. For En- guage assessment, such as (Thorndike, 1921; Vo-
glish, the task of grading texts into different
gel and Washburne, 1928; Patty and Painter, 1931;
levels of difficulty has long been facilitated by
the existence of word lists serving as guides Thorndike and Lorge, 1944; Dale and Chall, 1948;
for vocabulary selection. For Swedish, the Spache, 1953), and more recently (Nation, 1990;
situation is with a few exceptions less fortu- Nation, 2001), all included specific word lists as a
nate, in that no base vocabulary organized ac- criterion to measure text difficulty for English. In
cording to aspects of usage has existed. The quantitative associative studies of readability, some
Swedish base vocabulary – SweVoc – is an scheme for measuring the vocabulary difficulty is
attempt to remediate this. It is a comprehen-
set up, compared to a predefined criterion, and ex-
sive resource, aimed at differentiating vocab-
ulary items into categories of usage and fre-
pressed by a coefficient of correlation. In this way,
quency. As we are of the opinion that no cor- the word lists may be constructed in order to mir-
pus of written text can do fully justice of gen- ror vocabulary difficulty corresponding to school
eral language use, we have utilized materials grade levels. Thorndike’s (1921) word list of 10,000
from a second language as reference for de- words, later on revised into a list of 30,000 words
limiting the category of core words. Another (Thorndike and Lorge, 1944) and Spache’s revised
belief is that the task of defining a base vo- word list (Spache, 1974) of 1,040 entries, were
cabulary can not be fully automatic, and that
mainly constructed by judgment and common sense.
a considerable amount of manual, traditional
lexicographic work has to be invested. Hence, West published in 1953 the General Service List –
the present approach is not an innovative, but a a list of 2,000 words selected to represent the most
methodological approach to word list genera- frequent words in an English corpus.
tion for a specific purpose, much like LSP. We Vocabulary is also an important issue when pro-
anticipate SweVoc to be integrated in CALL ducing language-supportive aids for persons with
applications for vocabulary assessment, lan- deficient communication capability. Insufficient vo-
guage teaching and students’ practice.
cabulary knowledge implies a decrease in expressive
power of an utterance or written text, and the recep-
tive language skills are also heavily dependent upon
the individual vocabulary range. In order to obtain
maximum benefit from language supportive tools, "word family" is the most meaningful unit
the resources provided as word lists ought to be cho- to work with and pedagogically most use-
sen with care in order to conform to individual and ful.
situational needs. Also in generating LSP (language The word family concept was put forth by Bauer and
for specific purposes) and particular domain vocabu- Nation (1993), from a reader’s perspective defined to
lary lists, a list of general base vocabulary is needed comprise
in order to exclude the most common and general
words. a base word and all its derived and in-
In the following we are making a distinction be- flected forms that can be understood by a
tween base vocabulary and core vocabulary. A lan- learner without having to learn each form
guage teaching situation might involve a more ex- separately.
tensive base vocabulary, while assistive technology If all the lemmas belonging to a specific word
applications such as symbol boards for communica- family are considered as one member of the word
tion would benefit from a restricted core vocabulary, list, Hirsh and Nation (1992) found that a vocabulary
expandible with complementary vocabulary items size of at least 5,000 entries were needed in order
from different domains. The present approach is an to read unsimplified fiction texts. The same study
attempt to combine both models, i.e. it is a Swedish also showed that graded readers beginning at a level
core vocabulary word list, supplied with words be- of 2,600 word families would be of great benefit in
longing to a broader base vocabulary. language teaching.
Defining a core vocabulary is a task associated An attempt to construct a levelled base vocabu-
with several methodological challenges. Lee (2001) lary for another language than English was made by
has enumerated some of them. First of all, the con- De Mauro (1980) when he published a list of 7,400
cept of core vocabulary has to be settled. Several Italian words, categorized into three different groups
working definitions exist, out of which the most con- according to use. The only attempt in this direc-
tested point seems to be whether the list is based tion for Swedish was made by Forsbom (2006), who
on, and intended for, applications within written or derived a base vocabulary pool from a corpus of 1
spoken language, or both. If one decides to adopt million words – the Stockholm-Umeå Corpus (SUC)
the view that a core vocabulary is by definition (Källgren, 1992). This was achieved by ranking base
that which is central to the language as a whole, it word forms according to adjusted frequency over the
rules out for instance approaches based on frequency entire corpus, and then adopting a subsequent filter-
countings of words in written language. Further- ing technique that sorted out entries which did not
more, it should be untarnished from any stains of occur in more than three out of nine genres in the
genre, style, register or lect association. corpus. The result was a Swedish base vocabulary
In addition to the theoretically founded issues, pool (henceforward referred to as SBVP), with a to-
also problems of more practical nature arise. Al- tal amount of ≈ 8,200 word base forms, mirroring
though a major part of verbal communication is said the use of written Swedish in the early nineties.
to take place with the use of 1,500 - 2,000 words SBVP alone neither be considered to reflect mod-
(West, 1953), this figure must be considered in the ern language use, nor to be enough informative to
light of language-specific properties, of the type of independently serve as a source of words pertain-
communication, and above all, as a function of the ing to a restricted core vocabulary, since it is based
word concept. Counting lexemes, lemmas, baseform solely on written language. As already mentioned,
orthographic words or multiwords render different the base word forms in SBVP are ranked according
figures. For English, the notion of word family plays to adjusted frequency (AF: see equation 1), i.e. rel-
a central role when defining word list for educational ative frequency weighted with dispersion over the 9
purposes. Lee (2001), citing Schmitt (2000) main- categories (genres) in SUC. It implies that the vo-
tained that cabulary are those words that are not genre depen-
dent, given the subdivisions of a small-size text cor-
people in the field seem to agree that the pus. Furthermore, it lacks information at the lexeme
level, which reduces its feasability for purposes de- ward referred to as GUP. It consists of a vo-
manding a semantic disambiguation between words. cabulary of 7,400 words, mainly lemma forms,
A base form word like the Swedish noun gång has divided into three categories:
for instance four lexeme representations, belonging
• 2,100 basic words, regarded as fundamen-
to different base vocabulary categories. The first
tal for communication, representing a core
refers to ’time’ and is considered to be a core vocab-
vocabulary (C)
ulary item, while the sense ’path’ is not. The second
Issues regarding a distinction between lemma and • 2,400 words used in every-day communi-
lexeme concepts are discussed in Gardner (2007). cation (D)
Another flaw in SBVP is the absence of internal lev- • 2,900 words highly frequent in written
elling, which would be required in order to serve as text (H)
a list of core vocabulary words. In the present ap-
3. The Kelly modern vocabulary list (Johans-
proach, it was hence enriched with labels indicating
son Kokkinakis and Volodina, 2011) was used
levels of general use from three additional sources;
in order to ensure that frequent words used in
(1) a translated base vocabulary, (2) a list of words
modern settings were included. The Swedish
from modern vocabulary, and (3) a dictionary of
version of Kelly is derived from a large mod-
words denoting domestic life activities and partici-
ern corpus of web texts, and a subset of ≈ 500
pation in community activities. The final product is
words translated between Swedish and Italian
SweVoc, a base vocabulary word list, consisting of ≈
was employed.
8,500 words, mainly lemma forms, divided into five
different categories. 4. The ICF (Socialstyrelsen, 2003) is a classifica-
n √ tion of health and health-related domains, rang-
AF = (∑ di xi )2 ing from body structure to individual and soci-
i=1 etal issues. It was used as a reference word list
where in order to ensure coverage of words related to
AF = adjusted frequency every-day matters.
di = relative size of category i 3 Preprocessing
xi = frequency in category i
The Italian list of basic words was translated into
n = number of categories
Swedish by a second-language-speaker of Italian.
(1)
Localisms and archaisms in the source language
2 Material were ignored. The reason for using a foreign re-
source was two-fold; First, a base vocabulary should
SweVoc is a comprehensive resource, based on be selected in order to cover both universal concepts
lists of lexical items and texts from four different and essential phenomena and situations in the main
sources: local environment. Secondly, the manual translation
task revealed ambiguities due to different usage of
1. The backbone was the monolingual Swedish
words and word senses among two syntactically and
base vocabulary pool (SBVP) (Forsbom,
lexically distant languages, which contributed to a
2006), derived from the SUC corpus (Källgren,
more fine-grained levelling of words into different
1992), containing 8,213 base form entries. Per-
subcategories.
sonal nouns, numbers and punctuation marks
The Kelly modern word list was a result of the
were omitted, which reduced the number of en-
EC-financed project Kelly <http://kellyproject.eu>.
tries to ≈ 7,400.
The aim of the project was the generation of mono-
2. The second major resource is a translation of lingual word lists of nine languages, Arabic, Chi-
the earlier mentioned work by (De Mauro, nese, English, Greek, Italian, Norwegian, Polish,
1980), Guida all’uso delle parole hencefor- Russian and Swedish. The lists were generated from
many sources including web corpora in order to re-
flect a modern vocabulary. The lists were then all The degree of coverage of GUP lemmas in SBVP
translated into the eight other languages, generat- was also measured. It turned out that on overall
ing 72 language pairs. The Italian-Swedish is one 37.5% of the translated lemmas were present also
of them. The lists were then finally merged to 36 in SBVP, but that the (C) group had a significantly
lists. These lists are used in the Keewords language higher coverage. Of the total 2,143 candidate lem-
learning tool <http://Keewords.com>. mas considered as fundamental for communication,
Several structural differences between the two 81.4% were also present in the SBVP. Entries in the
main sources – SBVP and GUP – caused problems daily vocabulary (D) group were covered to 20.6%,
already at the preprocessing stage of SweVoc. As while 28.6% of the high-frequency lemmas (H) in
is shown in table 1, the tag set used in SBVP is in GUP were present also in SBVP. Of the 483 entries
PAROLE-format with morphosyntactic information, in the Kelly word list which did not occur in GUP,
while GUP was based simply on part-of-speech. In 288 were present in SBVP, i.e. 59.6%.
addition to automatic conversion into SUC-format
part-of-speech labelling, a considerable amount of 4.1 The final SweVoc
manual work was required to make the lists com- The GUP and Kelly word list entries that were
parable. However, as mentioned already in the in- present in SBVP were used to populate the first four
troduction, we are of the firm view that no wordlist categories in SweVoc, i.e. the core vocabulary items
aimed at specifying a base vocabulary can be pro- (C), words belonging to every-day language (D),
duced without a considerable degree of human inter- high frequency words (H), and words from mod-
vention. We hence regard the present approach to be ern vocabulary (K). Additionally, items lacking in
a pragmatic and feasible way to perform a restricted SBVP but present in both ICF and GUP, denot-
task. ing daily activities or phenomena, were included.
An example of such a word is andning ’breathing’.
4 Word list compilation
Words present only in ICF, denoting every-day sit-
Entries in SBVP were checked against GUP in or- uations and objects, were also added. The Swedish
der to find candidates for inclusion into SweVoc. verb möblera ’furnish’, exemplifies such a word. Fi-
As already mentioned, the lists were comparable in nally, a supplementary group of words present only
size (≈ 7,400 words), but differed largely as regards in SBVP were preserved, denoted by the category
to compilation methods and contents. As was ex- label (S). The word samband ’connection’ serves as
pected, many words in the each of the two lists cor- example from this category. An entry in SweVoc
responded to multiple entries in the other. Multi- consists of information regarding rank in SBVP, the
word expressions and structural differences between lemma form, the part-of-speech, and one or more
the languages also required particular consideration. category belongings. The entry form is given as ex-
One such example is the Swedish verb be ’ask, ample, illustrated below. It is a polysemous noun,
pray’, present among the 1,000 words with high- found among the 223 most frequent base forms in
est adjusted frequency in SBVP. GUP provides SBVP, and different senses of the lemma belong to
three different lexemes for this verb, either chiedere, different SweVoc categories.
’ask’, pregare ’pray’ and supplicare ’beseech’. All
the words fall into category (C) in Italian, which Rank Lemma POS Categories
would not necessarily be true for Swedish. In the 223 form NCU C, D
opposite direction, the Italian polysemous noun rap- In conclusion: the present version of SweVoc con-
porto, also among the words in category (C), is tains 7,572 lemmas pertaining to one or more of five
covered by three different entries in SBVP, either different categories. A lemma that is present in more
förhållande or relation ’relationship’, both among than one category has discriminatory lexical senses,
the top 1,000 entries, but also ’rapport’ ’report’, with which implies that the number of lexemes amounts
a lower adjusted frequency. to 8,468, see table 2. Category (C) is dominated
Rank Lemma Adj.Freq. Contr. (WF.PoS.Freq)
5 en.DI 25958.046833 9 ett.DI@[email protected] en.DI@[email protected]
140 en.MC 726.135618 9 en-.MC0000C.2 ett.MCNSNIS.276 en.MCUSNIS.463
167 en.PI 606.653923 9 ett.PI@[email protected] en.PI@[email protected] enom.PIUS0S.1
5708 en.RG 5.661842 4 en.RG0S.9

Table 1: Four different entries of the word en in SBVP

by nouns (38%), verbs (23%) and adjectives (13%). POS C D H K S


The category of words related to every-day matters Nouns 844 670 831 139 1,436
(D), is mainly composed of nouns (66% of the total Verbs 502 181 323 24 575
amount of lexemes), while verbs and adjectives only Adj 295 123 277 52 510
occur in 18 and 12% of the totality. In the group Adv 176 12 8 66 427
of high-frequency words (H), nouns were found to Part 168 29 58 7 194
cover 55%, verbs 21% and adjectives 18% of the Prep 42 1 0 0 0
lexemes. From the perspective of core vocabulary Conj 29 0 1 0 20
alone, category (C) include 21% of the total nouns in Pron 65 0 0 0 0
SweVoc, 31% of all verbs, and 23% of all the adjec- Det 16 0 0 0 0
tives. All pronouns and determiners were included Other 64 3 20 0 280
in (C). All prepositions except one were found in Total 2,201 1,019 1,518 288 3,442
category (C), except the word tills ’until’, which was
referred to the (D) category. One instance of all con- Table 3: Part-of-speech distributions in each Swe-
junctions (visserligen ’certainly’) was found in the Voc category
(H) category, while 58% appeared in category (C)
and the remaining 40% in category (S). Figures re-
garding ratios of participles and adverbs are gener-
ally somewhat unreliable since different principles
were used for corpus part-of-speech tagging in SUC 5 Evaluation
and word list creation of GUP. Specifics regarding In order to validate the reliability of the SweVoc,
the part-of-speech distributions in each category are evaluation was performed by coverage tests. It was
given in table 3. assumed that the coverage of SweVoc would vary
between texts of different types and from various
Label Category Ex Lexemes
genres. If the core vocabulary items were cor-
C Core vocabulary säga 2,201 rectly chosen, the degree of words from this cat-
D Words for every- soffa 1,019 egory would correspond to textual complexity, i.e.
day communication easier texts would contain more words from cate-
H High frequency sorg 1,518 gory (C). Another assumption was that the ratio of
words words from category (D) would vary depending on
K Words in Kelly debatt 288 genre, that it would be much smaller, and that the
modern vocabulary words from the Kelly list (K) would appear more
S Supplementary ting 3,442 frequently in recent texts. In order to test these hy-
words from SBVP potheses, evaluation was performed on texts from
Total 8,468 three different sources:
Table 2: SweVoc entries per category 1. The corpus LäsBarT (LB), which is a corpus of
1.4 million words, containing children’s fiction
for ages 6-12, and four easy-to-read text vari-
eties:
• Easy-to-read news texts SweVoc C D H K S
Type/
• Easy-to-read community information genre
texts ECF 92.5 82.4 0.8 2.1 0.7 6.5
• Easy-to-read children’s fiction OCF 90.6 80.4 1.0 1.9 0.7 6.6
• Easy-to-read adults’ fiction EAF 93.4 83.1 0.9 2.2 0.6 6.6
OAF 86.3 75.8 1.0 2.4 0.8 6.3
2. The corpus SUC EN 91.5 78.8 1.8 3.9 0.6 6.5
3. News text from the daily newspaper Göteborgs- ON 82.2 67.6 1.7 3.8 1.1 8.0
Posten (GP) published in 2007 EI 90.6 79.2 1.3 3.3 0.5 6.4

It was found that, on overall, 91.4% of the tokens Table 4: SweVoc lemmas, percentage of tokens in
in LB, 82.7% of the tokens in SUC, and 83.0% of the different subcorpora
tokens in GP were represented at the lemma level in ECF = Children’s easy-to-read fiction
SweVoc, while tokens belonging to the core vocab- OCF = Children’s ordinary fiction
ulary (C) amounted to 80.3% in LB, 68.4% in SUC, EAF = Adults’ easy-to-read fiction
and 69.9% in GP texts. The ratios of words related OAF = Adults’ ordinary fiction (SUC K)
to daily matters (D) were about the same in all texts EN = Easy-to-read news
(≈ 1.3%), but the ratios of high-frequency words ON = Ordinary news (SUC A and GP)
were significantly higher in SUC and GP than in LB EI = Easy-to-read community information
(p < 0.001). Supplementary words (S) were found
to be more frequent in SUC than in both GP and LB,
which was expected since the original SBVP was re-
trieved from SUC. By studying the figures in table 4
we can see that the the degree of words in category • panna..1 ansikte..1 (’face’) PRIM..1
(C) differ substantially between the ordinary and the
• panna..2 laga..2 (’cook’) PRIM..2
easy texts, and also that the percentage of core vo-
cabulary items is higher in fiction than in news and • panna..3 elda..1 (’make fire’) PRIM..1
informative texts.
The semantic paths in Saldo for each of the three
6 Foreseen improvements senses are illustrated below, each of the length of 6.
Entries in the present version of SweVoc preserve
information "inherited" from the translated word list panna..1→ansikte→huvud→kropp→varelse→vem
GUP, in that a lemma might be categorized with (’face’→’head’→’body’→’being’→’who’)
several labels depending on which lexeme it refers panna..2→laga→mat→äta→leva→vara
to. The Swedish polysemous word panna (’front’, (’cook’→’food’→’eat’→’live’→’be’)
’pan’, ’oven’) is for instance labelled both as a core panna..3→elda→eld→brinna→het→varm
word (C) and as a word referring to every-day issues (’make fire’→’fire’→’burn’→’hot’→’warm’)
(D). One valuable resource for disambiguation is the
Swedish word association lexicon Saldo (Borin and Frequency counts in SUC reveal that 77% of the
Forsberg, 2009), which is a modern Swedish se- instances referred to panna..1, 15% to panna..3, and
mantic and morphological lexical resource. It is su- 8% to panna..2. From these figures, it seems plausi-
perficially similar to Princeton WordNet (Fellbaum, ble that panna..1 would be referred to category (C),
1998), but different in the principles by which it is and either panna..2 or panna..3 or possibly both re-
structured. The organizational principles of Saldo ferred to category (D).
consist of two primitive semantic relations, or de- Regarding the CALL perspective of this lexical
scriptors, one of which is obligatory and the other resource, we foresee it as an asset for vocabulary in-
optional. When looking up panna in Saldo, we find struction and also as a resource in various CALL-
three competing lexemes: oriented learning platforms and applications, as
for instance the Lextutor, <http://www.lextutor.ca/>. issues on reliability, validity and coverage. In eLex
It is also relevant for integration into a Swedish Conference, Slovenia.
CALL platform under development, cf. Lärka Gunnel Källgren. 1992. SUC - the Stockholm - Umeå
<http://spraakbanken.gu.se/larka/>. Corpus Project: Corpus-based research on models for
processing unrestricted swedish text. Technical report,
7 Results and conclusion Stockholm.
D. Y. W. Lee. 2001. Defining core vocabulary and track-
We found that 81% of the GUP lemmas translated ing its distribution across spoken and written genres.
and selected as candidates for inclusion into cate- Journal of English Linguistics, 29:250–278.
gory (C) were actually to be regarded as pertaining Paul Nation. 1990. Teaching and learning vocabulary.
to a core vocabulary for Swedish. Additionally, 21% Heinle & Heinle, New York.
Paul Nation. 2001. Learning vocabulary in another lan-
of the lemmas in category (D) and 29% in category
guage. Cambridge University Press, Cambridge.
(H) were appropriate for inclusion as complemen-
W.W. Patty and W.I. Painter. 1931. Improving our
tary vocabulary words. method of selection high-school textbooks. Journal
The resulting word list – SweVoc – of ≈ 7,600 of Educational Research, XXIV:23–32, June.
Swedish lemmas is expected to be an asset in Norbert Schmitt. 2001. Vocabulary in language teach-
language learning and teaching and in readability ing. Cambridge University Press, Cambridge, UK.
checkers. The performance of other NLP applica- Socialstyrelsen. 2003. Klassifikation av funktionstill-
tions, such as classification tools and morphological stånd, funktionshinder och hälsa.
analyzers, would also improve with the access of a George D. Spache. 1953. A new readability formula for
restricted set of base vocabulary words. primary-grade reading materials. Elementary School
Journal, LIII:410–413.
George D. Spache. 1974. Good reading for poor read-
References ers. Garrard Publishing, Champaign, IL.
Edward L. Thorndike and I. Lorge. 1944. The teacher’s
Laurie Bauer and Paul Nation. 1993. Word families. word book of 30,000 words. Columbia University
International Journal of Lexicography, 6(4):253–279. Press, New York.
Lars Borin and Markus Forsberg. 2009. All in the fam- Edward L. Thorndike. 1921. The teacher’s word book.
ily: A comparison of SALDO and WordNet. In Pro- Teacher’s College, Columbia University, New York.
ceedings of the Nodalida 2009 Workshop on WordNets M. Vogel and C. Washburne. 1928. An objective method
and other Lexical Semantic Resources – between Lexi- of determining grade placement of children’s reading
cal Semantics, Lexicography, Terminology and Formal material. Elementary School Journal, 28:373–381.
Ontologies, Odense. Michael West. 1953. A General Service List of English
Edgar Dale and Jeanne S Chall. 1948. A formula for Words. Longman, London.
predicting readability. Educational Research Bulletin,
27:37–54.
Tullio De Mauro. 1980. Guida all’uso delle parole. Ed-
itori Riuniti, Roma.
Christiane Fellbaum, editor. 1998. WordNet: An elec-
tronic lexical database. MIT Press, Cambridge, MA.
Eva Forsbom. 2006. A Swedish Base Vocabulary Pool.
In Swedish Language Technology conference, Gothen-
burg.
Dee Gardner. 2007. Validating the construct of word in
applied corpus-based vocabulary research: A critical
survey. Applied Linguistics, 28(2):241–265.
David Hirsh and Paul Nation. 1992. What vocabulary
size is needed to read unsimplified texts for pleasure?
Reading in a Foreign Language, 8(2):689–696.
Sofie Johansson Kokkinakis and Elena Volodina. 2011.
Corpus-based approaches for the creation of a fre-
quency based vocabulary list in the EU project KELLY

You might also like