Apache solr소개 20120629

Apache Solr 소개

2012.06.29
윤도상
dsyoon@ncue.net

Solr 기능
• Schema
– 색인할 문서의 필드와 그 필드 타입을 쉽게 정의
– Lucene의 Analyzer 사용
– Dynamic Field를 지원
– Copy Field를 사용하여 여러 field를 검색 가능한 단일 field로 묶을 수 있음
– 외부 파일을 통해 금지어 등을 설정할 수 있다.
• Query
– HTTP 인터페이스로 XML/XSLT, JSON, Python, Ruby 와 같은 응답 포멧 설정
– 쿼리와 필드 값에 근거한 Faceted Search 제공
– query로 검색 정렬을 정의 가능
– 용이한 검색 score 설정
– query에 특정 field에 대한 가중치 부여 가능
• Core
– query handler와 확장 가능한 XML format
– unique key field에 기반하여 중복 문서 탐지
• Caching
– query 결과, 필터, 문서에 대한 캐시 설정
– 사용자 수준에서의 캐시 설정 지원
• Replication
– rsync transport를 통해 효과적인 분산 색인
• Admin Interface
– cache, update, query 상태를 알려줌.
– Text Analyzer에 대한 디버거 제공
– 웹 쿼리 인터페이스 제공

2

Overall Architecture

4

High Availability

6

Schema.xml
• Overall
<schema>
<types> … </types>
<fields> … </fields>
<uniqueKey />
<solrQueryParser />
<copyField />
<dynamicField />
</schema>

9

Schema.xml
• Type
<types>
<fieldType name="string" class="solr.StrField" sortMissingLast="true" />
<fieldType name="int" class="solr.TrieIntField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="tint" class="solr.TrieIntField" precisionStep="8" positionIncrementGap="0"/>
<fieldType name="date" class="solr.TrieDateField" omitNorms="true" precisionStep="0“ />
<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt” />
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true“ />
</analyzer>
</fieldType>
</types>

10

Schema.xml
• Fields
<fields>
<field name="id" type="string" indexed="true" stored="true" required="true" />
<field name=“release_dt" type="date" indexed="true" stored="true" />
<field name="title" type="text_general" indexed="true" stored="true" />
<field name=“content" type="text_general" indexed="true" stored="true" />
<field name=“text" type="text_general" indexed="true" stored="true" />
</fields>

• uniqueKey
– <uniqueKey>id</uniqueKey>
• solrQueryParser
– <solrQueryParser defaultOperator="OR"/>
• copyField
– <copyField source=“title" dest=“test"/>
– <copyField source=“content" dest=“test"/>
• dynamicField
– <dynamicField name="*_dt" type=“date" indexed="true" stored="true"/>
– <dynamicField name="*_text" type="string" indexed="true" stored="true"/>

11

Schema.xml
• Example for bigram analyzer
<fieldType name="text_cjk" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<filter class="solr.CJKWidthFilterFactory"/>
<filter class="solr.CJKBigramFilterFactory"/>
</analyzer>
</fieldType>

• Dynamically Reload
$curl „http://localhost:8983/solr/admin/cores?action=RELOAD&core=core0‟

[예) $ curl 'http://localhost:8981/solr/admin/cores?action=RELOAD&core=news „]

12

설정 파일
1. solr 디렉토리에 solr.xml 설정파일 수정
<solr persistent="true" sharedLib="lib">
<cores adminPath="/admin/cores" defaultCoreName=“core1">
<core name=“core1" instanceDir=“core_dir1" />
<core name=“core2" instanceDir=“core_dir2" />
</cores>
</solr>

2. solr 디렉토리에 core의 홈 디렉토리 생성
- solr
- core_dir1
- core_dir2

3. 생성한 각 디렉토리에 conf와 data 디렉토리를 생성한다.  data 경로는
solrconfig.xml에서 아래와 같은 부분에서 설정할 수 있다.
<dataDir>${solr.data.dir:}</dataDir>
- solr
- core_dir1
- conf
- data
- core_dir2
- conf
- data
14

Web Admin Interface
• Config, Schema, Distribution 정보 조회
• Query Interface
• 각종 통계
– Caches: lookups, hits, hitratio, inserts, evictions, size
– RequestHandlers: requests, errors
– UpdateHandler: adds, deletes, commits, optimizes
– IndexReader, open-time, index-version, numDocs, maxDocs
• Analysis Debugger
– 각 분석 단계에 대한 결과를 보여줌
– 쿼리와 색인에 대한 매치에 대한 정보를 보여줌

16

XML
• Document
<add> <delete>
<doc> <id>05991</id>
<field name="employeeId">05991</field> <id>06000</id>
<field name="office">Bridgewater</field> <query>office:Bridgewater</query>
<field name="skills">Perl</field> <query>office:Osaka</query>
<field name="skills">Java</field> </delete>
</doc>
</add>

• Indexing
$ curl http://localhost:8983/solr/update?commit=true -H “Content-Type: text/xml”
--data-binary ‘<add><doc><field name="id">testdoc</field></doc></add>’

• Update
$ curl http://localhost:8983/solr/update -H “Content-Type: text/xml”
--data-binary ‘<add><doc boost="2.5“><field name="employeeId">05991</field>
<field name="office" boost="2.0">Bridgewater</field> </doc> </add>’

• Commit
$ curl http://localhost:8983/solr/update -H “Content-Type: text/xml”
--data-binary ‘<commit waitFlush="false" waitSearcher="false"/>’

18

Json
• Document
[
{ "id" : "MyTestDocument",
"title" : "This is just a test“ }
]

• Indexing
$ curl http://localhost:8983/solr/update/json -H 'Content-type:application/json' -d
' [ { "id" : "MyTestDocument", "title" : "This is just a test" } ]'

• Update/Delete
$ curl http://localhost:8983/solr/update/json -H 'Content-type:application/json' -d
'{
"add": {"doc": {"id" : "TestDoc1", "title" : "test1"} },
"add": {"doc": {"id" : "TestDoc2", "title" : "another test“} },
“delete”: {"id" : "TestDoc1“ } },
“delete”: {“query" : “Test“, 'commitWithin':'500' } },
}'

• Commit
$ curl http://localhost:8983/solr/update?commit=true

19

CVS
• Document
[test.cvs] [test.cvs]
fieldnames=id,,category fieldnames=id,title,category
100,”title”, ”This Value is "“food“”" 100,”title”, ”This Value is "“food“”"

• Indexing
$ curl http://localhost:8983/solr/update/csv --data-binary @test.csv -H 'Content-type:text/plain; charset=utf-8'

• Example from Mysql Dump
$ curl 'http://localhost:8983/solr/update/csv?commit=true&separator=%09&escape=&stream.file=/tmp/result.text‘

20

Full-import

• 테스트 DB 구성 예
Create database solr;
Grant alter, select, insert, update, delete on solr.* to solr@localhost identified by „solr‟;

Create table maker (
mid int primary key auto_increment,
name varchar(30) not null,
lastmodified datetime );
Create table product (
id int primary key auto_increment,
mid int not null,
name varchar(30) not null,
hname varchar(30) not null,
lastmodified datetime );

Insert into maker(name, lastmodified) values('apple', '2012-05-11 17:00:00');
Insert into maker(name, lastmodified) values('sony', '2012-05-11 17:00:00');
Insert into maker(name, lastmodified) values('microsoft', '2012-05-11 17:00:00');

Insert into product(mid, name, hname, lastmodified) values(1, 'iphone', '아이폰', '2012-05-11 17:00:00');
Insert into product(mid, name, hname, lastmodified) values(1, 'ipod', '아아팟', '2012-05-11 17:00:00');
Insert into product(mid, name, hname, lastmodified) values(1, 'ipad', '아이패드', '2012-05-11 17:00:00');
Insert into product(mid, name, hname, lastmodified) values(2, 'walkman', '워크맨', '2012-05-11 17:00:00');
Insert into product(mid, name, hname, lastmodified) values(2, 'vaio', '바이오', '2012-05-11 17:00:00');
Insert into product(mid, name, hname, lastmodified) values(3, 'windowsxp', '윈도우xp', '2012-05-11 17:00:00');
Insert into product(mid, name, hname, lastmodified) values(3, 'windowx7', '윈도우7', '2012-05-11 17:00:00');

22

Full-import

• MYSQL Connection 설정
– Solrconfig.xml 파일에서 db 설정 파일을 지정한다.
<requestHandler name="/dataimport“ class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">db-data-config.xml</str>
</lst>
</requestHandler>

– db-data-config.xml 파일에서 데이터에 대한 SQL문을 적용한다.
<dataConfig>
<dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver”
url="jdbc:mysql://localhost/solr" user="solr" password="solr" name="solr"/>
<document>
<entity name="product" query="select id, mid, name from product">
<field column="id" name="pid" />
<field column="mid" name="mid" />
<field column="name" name="pname" />
<field column=“hname" name=“hname" />
<entity name="maker" query="select mid, name from maker where mid = '${product.mid}'">
<field column="name" name="mname" />
</entity>
</entity>
</document>
</dataConfig>

23

Full-import

• 색인 설정
– Shema.xml 파일에서 검색 필드를 설정
<field name="pid" type="string" indexed="true" stored="true" required="true" />
<field name="mid" type="int" indexed="true" stored="true" multiValued="false“ />
<field name="pname" type="text" indexed="true" stored="true" multiValued="true“ />
<field name="mname" type="text" indexed="true" stored="true" multiValued="true“ />
……..
<defaultSearchField>pname</defaultSearchField>
<defaultSearchField>mname</defaultSearchField>
……..
<uniqueKey>pid</uniqueKey>
……..
<copyField source="pname" dest="text"/>
<copyField source="mname" dest="text"/>

– Solr 실행
java -Dsolr.solr.home="./example-DIH/solr/" -jar start.jar

– 색인 실행
http://localhost:8983/solr/db/dataimport?command=full-import

24

Delta-import

• 테스트 DB 구성 예
Insert into maker(name, lastmodified) values('Samsung', '2012-05-14 14:00:00');
Insert into maker(name, lastmodified) values('LG', '2012-05-14 14:00:00');

Insert into product(mid, name, hname, lastmodified) values(4, 'GalaxyS', '겔럭시S', '2012-05-14 14:00:00');
Insert into product(mid, name, hname, lastmodified) values(4, 'GalaxyA', '겔럭시A', '2012-05-14 14:00:00');
Insert into product(mid, name, hname, lastmodified) values(4, 'GalaxyNote', '겔럭시노트', '2012-05-14 14:00:00');
Insert into product(mid, name, hname, lastmodified) values(5, 'OptimusLTE', '옵티머스LTE', '2012-05-14 14:00:00');
Insert into product(mid, name, hname, lastmodified) values(5, 'VegaLTE', '베가LTE', '2012-05-14 14:00:00');

25

Delta-import

• MYSQL Connection 설정
– db-data-config.xml 파일에서 데이터에 대한 SQL문을 적용한다.
<dataConfig>
<dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://localhost/solr" user="solr" password="solr" name="solr"/>
<document>
<entity name="product" pk="id“
query="select * from product“
deltaImportQuery="select * from product where id='${dataimporter.delta.id}'“
deltaQuery="select id from product where lastmodified > '${dataimporter.last_index_time}'">
<field column="id" name="pid" />
<field column="name" name="pname" />
<entity name="maker" pk="mid“
query="select mid from maker where mid='${product.mid}'">
<field column="name" name="mname" />
</entity>
</entity>
</document>
</dataConfig>

– 색인 실행
http://localhost:8983/solr/db/dataimport?command=delta-import

26

Index
• 기존 데이터를 모두 지움
$ java -Durl=http://localhost:$port/solr/update/?commit=true
-Ddata=args -jar $dir/post.jar "<delete><query>*:*</query></delete>"

• 다음과 같이 post.jar 파일을 이용하여 색인함
$ java -Durl=http://localhost:8983/solr/core1/update/?commit=true -jar post.jar core1_data.xml
$ java -Durl=http://localhost:8983/solr/core2/update/?commit=true -jar post.jar core1_data.xml

※ 주의
– 처음 색인 파일 생성시
<doc>
<field name="id">id1</field>
<field name=“title“>title1</field>
</doc>

– 색인 파일 갱신시
<update>
<doc>
<field name="id">id1</field>
<field name=“title“>title1</field>
</doc>
</update>

28

Search Parameter
Parameter Default Description

q 검색 쿼리. 예) q=video 혹은 q=title:spiderman^10 text:spiderman

start 0 검색된 결과 리스트에 대한 Offset

rows 10 반환될 결과 문서 수

반환될 필드 (필드명은 comma로 구분)
fl *
예) fl=*,score 혹은 fl=id, name
qf 결과로써 제공받을 필드 지정. 예) q=superman&qf=title subject

오름/내림차순으로 검색할 필드 지정
sort
예) sort=inStock asc, price desc 혹은 sort=price asc
wt Writer type. 예) wt=json 혹은 wt=xml

필터 쿼리 지정 (결과내 검색 기능)
fq
예) q=video&fq=superman
hl 하이라이트 필드 지정. 예) hl=true&hl.fl=name, description

Faceted Search
facet 예) facet=true&facet.field=cat
facet.query=price:[0 TO 100]&facet.query=price:[100 TO *]
debugQuery 검색결과에 debug 결과를 추가해 보여줌

30

Query Examples
• mission이나 impossible이 포함되고 releaseDate로 내림차순 검색
– q=mission impossible; releaseDate desc
• mission을 포함하고actor에 cruise가 포함되지 않은 문서를 검색
– q=+mission –actor:cruise
• mission impossible이 붙고, actor에 cruise가 포함되지 않은 문서 검색
– q=“mission impossible” –actor:cruise
• title에 spiderman을 description의 spiderman보다 10의 가중치 부여
– q=title:spiderman^10 description:spiderman
• description필드에서 spiderman과 movie가 10단어 이내의 문서 검색
– q=description:“spiderman movie”~10
• HDTV를 반드시 포함하고 weight이 40 이상인 문서를 검색
– q=+HDTV +weight:[40 TO *]
• Wildcard queries
• q=te?t
• q=te*t
• q=test*

31

Search Relevancy

32

Faceted Browsing

33

Suggest
• 설정
– Solrconfig.xml에 suggest 기능을 추가한다.
<searchComponent name="suggest" class="solr.SpellCheckComponent">
<lst name="spellchecker">
<str name="name">suggest</str>
<str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
<str name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookup</str>
<str name="field">name_autocomplete</str>
</lst>
</searchComponent>
<requestHandler name="/suggest" class="org.apache.solr.handler.component.SearchHandler">
<lst name="defaults">
<str name="spellcheck">true</str>
<str name="spellcheck.dictionary">suggest</str>
<str name="spellcheck.count">10</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>

35

Suggest
• 설정
– Shema.xml에 suggest 필드를 추가한다.
<fieldType name="text_auto" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1” generateNumberParts="1"
catenateWords="1" catenateNumbers="1" catenateAll="0” splitOnCaseChange="1"/>
</analyzer>
</fieldType>
<field name="name_autocomplete" type="text_auto" indexed="true" stored="true” multiValued="false" />
<copyField source="name" dest="name_autocomplete" />

• 검색 실행 (http://localhost:8983/solr/db/suggest?spellcheck.build=true)

http://localhost:8983/solr/db/suggest?q=겔 http://localhost:8983/solr/db/suggest?q=윈도

36

Basic Dictionary
- 동의어/불용어 사전-

동의어 사전
• 항목 (synonyms.txt)
Window => windowxp window7 window8
window 7, door

• 테스트 쿼리 [Query: window 7]

38

동의어 사전
• 테스트 쿼리 [Query: window]

• 테스트 쿼리 [Query: door]

39

불용어 사전
• 항목 (stopwords.txt)
Window
• 테스트 쿼리 [Query: window 7]

• 테스트 쿼리 [Query: window]

• 테스트 쿼리 [Query: door]

40

Apache solr소개 20120629

More Related Content

What's hot(20)

Viewers also liked(16)

Similar to Apache solr소개 20120629(20)

Apache solr소개 20120629