Apache Solr 소개




2012.06.29
윤도상
dsyoon@ncue.net
Solr 기능
 •   Schema
      –     색인할 문서의 필드와 그 필드 타입을 쉽게 정의
      –     Lucene의 Analyzer 사용
      –     Dynamic Field를 지원
      –     Copy Field를 사용하여 여러 field를 검색 가능한 단일 field로 묶을 수 있음
      –     외부 파일을 통해 금지어 등을 설정할 수 있다.
 •   Query
      –     HTTP 인터페이스로 XML/XSLT, JSON, Python, Ruby 와 같은 응답 포멧 설정
      –     쿼리와 필드 값에 근거한 Faceted Search 제공
      –     query로 검색 정렬을 정의 가능
      –     용이한 검색 score 설정
      –     query에 특정 field에 대한 가중치 부여 가능
 •   Core
      –     query handler와 확장 가능한 XML format
      –     unique key field에 기반하여 중복 문서 탐지
 •   Caching
      –     query 결과, 필터, 문서에 대한 캐시 설정
      –     사용자 수준에서의 캐시 설정 지원
 •   Replication
      –     rsync transport를 통해 효과적인 분산 색인
 •   Admin Interface
      –     cache, update, query 상태를 알려줌.
      –     Text Analyzer에 대한 디버거 제공
      –     웹 쿼리 인터페이스 제공




                                                                     2
Architecture
Overall Architecture




                       4
Component




            5
High Availability




                    6
Replication




              7
Configure
Schema.xml
 • Overall
       <schema>
         <types> … </types>
         <fields> … </fields>
         <uniqueKey />
         <solrQueryParser />
         <copyField />
         <dynamicField />
       </schema>




                                9
Schema.xml
 •    Type
 <types>
   <fieldType name="string" class="solr.StrField" sortMissingLast="true" />
   <fieldType name="int" class="solr.TrieIntField" precisionStep="0" positionIncrementGap="0"/>
   <fieldType name="tint" class="solr.TrieIntField" precisionStep="8" positionIncrementGap="0"/>
   <fieldType name="date" class="solr.TrieDateField" omitNorms="true" precisionStep="0“ />
   <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
     <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt” />
        <filter class="solr.LowerCaseFilterFactory"/>
     </analyzer>
     <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true“ />
        <filter class="solr.LowerCaseFilterFactory"/>
        </analyzer>
   </fieldType>
 </types>




                                                                                                   10
Schema.xml
 •   Fields
      <fields>
        <field name="id" type="string" indexed="true" stored="true" required="true" />
        <field name=“release_dt" type="date" indexed="true" stored="true" />
        <field name="title" type="text_general" indexed="true" stored="true" />
        <field name=“content" type="text_general" indexed="true" stored="true" />
        <field name=“text" type="text_general" indexed="true" stored="true" />
      </fields>

 •   uniqueKey
      –   <uniqueKey>id</uniqueKey>
 •   solrQueryParser
      –   <solrQueryParser defaultOperator="OR"/>
 •   copyField
      –   <copyField source=“title" dest=“test"/>
      –   <copyField source=“content" dest=“test"/>
 •   dynamicField
      –   <dynamicField name="*_dt" type=“date" indexed="true" stored="true"/>
      –   <dynamicField name="*_text" type="string" indexed="true" stored="true"/>




                                                                                         11
Schema.xml
 • Example for bigram analyzer
    <fieldType name="text_cjk" class="solr.TextField" positionIncrementGap="100">
       <analyzer>
          <tokenizer class="solr.StandardTokenizerFactory"/>
          <filter class="solr.CJKWidthFilterFactory"/>
          <filter class="solr.LowerCaseFilterFactory"/>
          <filter class="solr.CJKBigramFilterFactory"/>
       </analyzer>
    </fieldType>


 • Dynamically Reload
    $curl „http://localhost:8983/solr/admin/cores?action=RELOAD&core=core0‟

    [예) $ curl 'http://localhost:8981/solr/admin/cores?action=RELOAD&core=news „]




                                                                                    12
Multi-Core
설정 파일
 1.   solr 디렉토리에 solr.xml 설정파일 수정
      <solr persistent="true" sharedLib="lib">
       <cores adminPath="/admin/cores" defaultCoreName=“core1">
        <core name=“core1" instanceDir=“core_dir1" />
        <core name=“core2" instanceDir=“core_dir2" />
       </cores>
      </solr>

 2.   solr 디렉토리에 core의 홈 디렉토리 생성
      - solr
               - core_dir1
               - core_dir2

 3.   생성한 각 디렉토리에 conf와 data 디렉토리를 생성한다.  data 경로는
      solrconfig.xml에서 아래와 같은 부분에서 설정할 수 있다.
      <dataDir>${solr.data.dir:}</dataDir>
      - solr
               - core_dir1
                     - conf
                     - data
               - core_dir2
                     - conf
                     - data
                                                                  14
Web Admin Interface
Web Admin Interface
  •   Config, Schema, Distribution 정보 조회
  •   Query Interface
  •   각종 통계
       –   Caches: lookups, hits, hitratio, inserts, evictions, size
       –   RequestHandlers: requests, errors
       –   UpdateHandler: adds, deletes, commits, optimizes
       –   IndexReader, open-time, index-version, numDocs, maxDocs
  •   Analysis Debugger
       –   각 분석 단계에 대한 결과를 보여줌
       –   쿼리와 색인에 대한 매치에 대한 정보를 보여줌




                                                                       16
Solr Document
XML
 • Document
      <add>                                                   <delete>
        <doc>                                                   <id>05991</id>
          <field name="employeeId">05991</field>                <id>06000</id>
          <field name="office">Bridgewater</field>              <query>office:Bridgewater</query>
          <field name="skills">Perl</field>                     <query>office:Osaka</query>
          <field name="skills">Java</field>                   </delete>
        </doc>
      </add>


 • Indexing
      $ curl http://localhost:8983/solr/update?commit=true -H “Content-Type: text/xml”            
                --data-binary ‘<add><doc><field name="id">testdoc</field></doc></add>’


 • Update
      $ curl http://localhost:8983/solr/update -H “Content-Type: text/xml”                                
             --data-binary ‘<add><doc boost="2.5“><field name="employeeId">05991</field>              
                            <field name="office" boost="2.0">Bridgewater</field> </doc> </add>’

 • Commit
      $ curl http://localhost:8983/solr/update -H “Content-Type: text/xml”        
              --data-binary ‘<commit waitFlush="false" waitSearcher="false"/>’




                                                                                                              18
Json
 • Document
       [
           { "id" : "MyTestDocument",
             "title" : "This is just a test“ }
       ]


 • Indexing
       $ curl http://localhost:8983/solr/update/json -H 'Content-type:application/json' -d   
              ' [ { "id" : "MyTestDocument", "title" : "This is just a test" } ]'


 • Update/Delete
       $ curl http://localhost:8983/solr/update/json -H 'Content-type:application/json' -d
              '{
                    "add": {"doc": {"id" : "TestDoc1", "title" : "test1"} },
                    "add": {"doc": {"id" : "TestDoc2", "title" : "another test“} },
                    “delete”: {"id" : "TestDoc1“ } },
                    “delete”: {“query" : “Test“, 'commitWithin':'500' } },
                 }'



 • Commit
       $ curl http://localhost:8983/solr/update?commit=true




                                                                                                 19
CVS
 • Document
      [test.cvs]                              [test.cvs]
      fieldnames=id,,category                 fieldnames=id,title,category
      100,”title”, ”This Value is "“food“”"   100,”title”, ”This Value is "“food“”"



 • Indexing
      $ curl http://localhost:8983/solr/update/csv --data-binary @test.csv -H 'Content-type:text/plain; charset=utf-8'




 • Example from Mysql Dump
      $ curl 'http://localhost:8983/solr/update/csv?commit=true&separator=%09&escape=&stream.file=/tmp/result.text‘




                                                                                                                         20
Data Handler Interface
Full-import

  • 테스트 DB 구성 예
       Create database solr;
       Grant alter, select, insert, update, delete on solr.* to solr@localhost identified by „solr‟;

       Create table maker (
                     mid int primary key auto_increment,
                     name varchar(30) not null,
                     lastmodified datetime );
       Create table product (
                     id int primary key auto_increment,
                     mid int not null,
                     name varchar(30) not null,
                     hname varchar(30) not null,
                     lastmodified datetime );

       Insert into maker(name, lastmodified) values('apple', '2012-05-11 17:00:00');
       Insert into maker(name, lastmodified) values('sony', '2012-05-11 17:00:00');
       Insert into maker(name, lastmodified) values('microsoft', '2012-05-11 17:00:00');

       Insert into product(mid, name, hname, lastmodified) values(1, 'iphone', '아이폰', '2012-05-11 17:00:00');
       Insert into product(mid, name, hname, lastmodified) values(1, 'ipod', '아아팟', '2012-05-11 17:00:00');
       Insert into product(mid, name, hname, lastmodified) values(1, 'ipad', '아이패드', '2012-05-11 17:00:00');
       Insert into product(mid, name, hname, lastmodified) values(2, 'walkman', '워크맨', '2012-05-11 17:00:00');
       Insert into product(mid, name, hname, lastmodified) values(2, 'vaio', '바이오', '2012-05-11 17:00:00');
       Insert into product(mid, name, hname, lastmodified) values(3, 'windowsxp', '윈도우xp', '2012-05-11 17:00:00');
       Insert into product(mid, name, hname, lastmodified) values(3, 'windowx7', '윈도우7', '2012-05-11 17:00:00');




                                                                                                                     22
Full-import

  • MYSQL Connection 설정
      – Solrconfig.xml 파일에서 db 설정 파일을 지정한다.
          <requestHandler name="/dataimport“ class="org.apache.solr.handler.dataimport.DataImportHandler">
             <lst name="defaults">
                <str name="config">db-data-config.xml</str>
             </lst>
          </requestHandler>


      – db-data-config.xml 파일에서 데이터에 대한 SQL문을 적용한다.
          <dataConfig>
            <dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver”
                         url="jdbc:mysql://localhost/solr" user="solr" password="solr" name="solr"/>
            <document>
               <entity name="product" query="select id, mid, name from product">
                 <field column="id" name="pid" />
                 <field column="mid" name="mid" />
                 <field column="name" name="pname" />
                 <field column=“hname" name=“hname" />
                 <entity name="maker" query="select mid, name from maker where mid = '${product.mid}'">
                    <field column="mid" name="mid" />
                    <field column="name" name="mname" />
                 </entity>
               </entity>
            </document>
          </dataConfig>



                                                                                                             23
Full-import

  • 색인 설정
      – Shema.xml 파일에서 검색 필드를 설정
          <field name="pid" type="string" indexed="true" stored="true" required="true" />
          <field name="mid" type="int" indexed="true" stored="true" multiValued="false“ />
          <field name="pname" type="text" indexed="true" stored="true" multiValued="true“ />
          <field name="mname" type="text" indexed="true" stored="true" multiValued="true“ />
          ……..
          <defaultSearchField>pname</defaultSearchField>
          <defaultSearchField>mname</defaultSearchField>
          ……..
          <uniqueKey>pid</uniqueKey>
          ……..
          <copyField source="pname" dest="text"/>
          <copyField source="mname" dest="text"/>


      – Solr 실행
          java -Dsolr.solr.home="./example-DIH/solr/" -jar start.jar



      – 색인 실행
          http://localhost:8983/solr/db/dataimport?command=full-import




                                                                                               24
Delta-import

  • 테스트 DB 구성 예
      Insert into maker(name, lastmodified) values('Samsung', '2012-05-14 14:00:00');
      Insert into maker(name, lastmodified) values('LG', '2012-05-14 14:00:00');

      Insert into product(mid, name, hname, lastmodified) values(4, 'GalaxyS', '겔럭시S', '2012-05-14 14:00:00');
      Insert into product(mid, name, hname, lastmodified) values(4, 'GalaxyA', '겔럭시A', '2012-05-14 14:00:00');
      Insert into product(mid, name, hname, lastmodified) values(4, 'GalaxyNote', '겔럭시노트', '2012-05-14 14:00:00');
      Insert into product(mid, name, hname, lastmodified) values(5, 'OptimusLTE', '옵티머스LTE', '2012-05-14 14:00:00');
      Insert into product(mid, name, hname, lastmodified) values(5, 'VegaLTE', '베가LTE', '2012-05-14 14:00:00');




                                                                                                                25
Delta-import

  • MYSQL Connection 설정
      – db-data-config.xml 파일에서 데이터에 대한 SQL문을 적용한다.
        <dataConfig>
          <dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver"
                      url="jdbc:mysql://localhost/solr" user="solr" password="solr" name="solr"/>
          <document>
             <entity name="product" pk="id“
                    query="select * from product“
                    deltaImportQuery="select * from product where id='${dataimporter.delta.id}'“
                    deltaQuery="select id from product where lastmodified > '${dataimporter.last_index_time}'">
               <field column="id" name="pid" />
               <field column="mid" name="mid" />
               <field column="name" name="pname" />
               <entity name="maker" pk="mid“
                       query="select mid from maker where mid='${product.mid}'">
                  <field column="mid" name="mid" />
                  <field column="name" name="mname" />
               </entity>
             </entity>
          </document>
        </dataConfig>


      – 색인 실행
        http://localhost:8983/solr/db/dataimport?command=delta-import




                                                                                                                  26
Index
Index
  •   기존 데이터를 모두 지움
        $ java -Durl=http://localhost:$port/solr/update/?commit=true
              -Ddata=args -jar $dir/post.jar "<delete><query>*:*</query></delete>"

  •   다음과 같이 post.jar 파일을 이용하여 색인함
        $ java -Durl=http://localhost:8983/solr/core1/update/?commit=true -jar post.jar core1_data.xml
        $ java -Durl=http://localhost:8983/solr/core2/update/?commit=true -jar post.jar core1_data.xml



  ※ 주의
        – 처음 색인 파일 생성시
               <doc>
                 <field name="id">id1</field>
                 <field name=“title“>title1</field>
               </doc>

        – 색인 파일 갱신시
               <update>
                 <doc>
                   <field name="id">id1</field>
                   <field name=“title“>title1</field>
                 </doc>
               </update>



                                                                                                         28
Search
Search Parameter
    Parameter   Default                             Description

   q                      검색 쿼리. 예) q=video 혹은 q=title:spiderman^10 text:spiderman

   start        0         검색된 결과 리스트에 대한 Offset

   rows         10        반환될 결과 문서 수

                          반환될 필드 (필드명은 comma로 구분)
   fl           *
                          예) fl=*,score 혹은 fl=id, name
   qf                     결과로써 제공받을 필드 지정. 예) q=superman&qf=title subject

                          오름/내림차순으로 검색할 필드 지정
   sort
                          예) sort=inStock asc, price desc 혹은 sort=price asc
   wt                     Writer type. 예) wt=json 혹은 wt=xml

                          필터 쿼리 지정 (결과내 검색 기능)
   fq
                          예) q=video&fq=superman
   hl                     하이라이트 필드 지정. 예) hl=true&hl.fl=name, description

                          Faceted Search
   facet                  예) facet=true&facet.field=cat
                            facet.query=price:[0 TO 100]&facet.query=price:[100 TO *]
   debugQuery             검색결과에 debug 결과를 추가해 보여줌


                                                                                        30
Query Examples
 •   mission이나 impossible이 포함되고 releaseDate로 내림차순 검색
     – q=mission impossible; releaseDate desc
 •   mission을 포함하고actor에 cruise가 포함되지 않은 문서를 검색
     – q=+mission –actor:cruise
 •   mission impossible이 붙고, actor에 cruise가 포함되지 않은 문서 검색
     – q=“mission impossible” –actor:cruise
 •   title에 spiderman을 description의 spiderman보다 10의 가중치 부여
     – q=title:spiderman^10 description:spiderman
 •   description필드에서 spiderman과 movie가 10단어 이내의 문서 검색
     – q=description:“spiderman movie”~10
 •   HDTV를 반드시 포함하고 weight이 40 이상인 문서를 검색
     – q=+HDTV +weight:[40 TO *]
 •   Wildcard queries
     •   q=te?t
     •   q=te*t
     •   q=test*



                                                             31
Search Relevancy




                   32
Faceted Browsing




                   33
Autocomplete
Suggest
 •   설정
     – Solrconfig.xml에 suggest 기능을 추가한다.
          <searchComponent name="suggest" class="solr.SpellCheckComponent">
            <lst name="spellchecker">
              <str name="name">suggest</str>
              <str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
              <str name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookup</str>
              <str name="field">name_autocomplete</str>
            </lst>
          </searchComponent>
          <requestHandler name="/suggest" class="org.apache.solr.handler.component.SearchHandler">
            <lst name="defaults">
              <str name="spellcheck">true</str>
              <str name="spellcheck.dictionary">suggest</str>
              <str name="spellcheck.count">10</str>
            </lst>
            <arr name="components">
              <str>suggest</str>
            </arr>
          </requestHandler>




                                                                                                     35
Suggest
 •    설정
       – Shema.xml에 suggest 필드를 추가한다.
            <fieldType name="text_auto" class="solr.TextField" positionIncrementGap="100">
              <analyzer>
                 <tokenizer class="solr.WhitespaceTokenizerFactory"/>
                 <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1” generateNumberParts="1"
                         catenateWords="1" catenateNumbers="1" catenateAll="0” splitOnCaseChange="1"/>
                 <filter class="solr.LowerCaseFilterFactory"/>
              </analyzer>
            </fieldType>
            <field name="name_autocomplete" type="text_auto" indexed="true" stored="true” multiValued="false" />
            <copyField source="name" dest="name_autocomplete" />


 •    검색 실행 (http://localhost:8983/solr/db/suggest?spellcheck.build=true)

     http://localhost:8983/solr/db/suggest?q=겔              http://localhost:8983/solr/db/suggest?q=윈도




                                                                                                                   36
Basic Dictionary
- 동의어/불용어 사전-
동의어 사전
 • 항목 (synonyms.txt)
      Window => windowxp window7 window8
      window 7, door

 • 테스트 쿼리 [Query: window 7]




                                           38
동의어 사전
 • 테스트 쿼리 [Query: window]




 • 테스트 쿼리 [Query: door]




                            39
불용어 사전
 • 항목 (stopwords.txt)
      Window
 • 테스트 쿼리 [Query: window 7]




 • 테스트 쿼리 [Query: window]


 • 테스트 쿼리 [Query: door]




                              40

Apache solr소개 20120629

  • 1.
  • 2.
    Solr 기능 • Schema – 색인할 문서의 필드와 그 필드 타입을 쉽게 정의 – Lucene의 Analyzer 사용 – Dynamic Field를 지원 – Copy Field를 사용하여 여러 field를 검색 가능한 단일 field로 묶을 수 있음 – 외부 파일을 통해 금지어 등을 설정할 수 있다. • Query – HTTP 인터페이스로 XML/XSLT, JSON, Python, Ruby 와 같은 응답 포멧 설정 – 쿼리와 필드 값에 근거한 Faceted Search 제공 – query로 검색 정렬을 정의 가능 – 용이한 검색 score 설정 – query에 특정 field에 대한 가중치 부여 가능 • Core – query handler와 확장 가능한 XML format – unique key field에 기반하여 중복 문서 탐지 • Caching – query 결과, 필터, 문서에 대한 캐시 설정 – 사용자 수준에서의 캐시 설정 지원 • Replication – rsync transport를 통해 효과적인 분산 색인 • Admin Interface – cache, update, query 상태를 알려줌. – Text Analyzer에 대한 디버거 제공 – 웹 쿼리 인터페이스 제공 2
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
    Schema.xml • Overall <schema> <types> … </types> <fields> … </fields> <uniqueKey /> <solrQueryParser /> <copyField /> <dynamicField /> </schema> 9
  • 10.
    Schema.xml • Type <types> <fieldType name="string" class="solr.StrField" sortMissingLast="true" /> <fieldType name="int" class="solr.TrieIntField" precisionStep="0" positionIncrementGap="0"/> <fieldType name="tint" class="solr.TrieIntField" precisionStep="8" positionIncrementGap="0"/> <fieldType name="date" class="solr.TrieDateField" omitNorms="true" precisionStep="0“ /> <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt” /> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true“ /> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType> </types> 10
  • 11.
    Schema.xml • Fields <fields> <field name="id" type="string" indexed="true" stored="true" required="true" /> <field name=“release_dt" type="date" indexed="true" stored="true" /> <field name="title" type="text_general" indexed="true" stored="true" /> <field name=“content" type="text_general" indexed="true" stored="true" /> <field name=“text" type="text_general" indexed="true" stored="true" /> </fields> • uniqueKey – <uniqueKey>id</uniqueKey> • solrQueryParser – <solrQueryParser defaultOperator="OR"/> • copyField – <copyField source=“title" dest=“test"/> – <copyField source=“content" dest=“test"/> • dynamicField – <dynamicField name="*_dt" type=“date" indexed="true" stored="true"/> – <dynamicField name="*_text" type="string" indexed="true" stored="true"/> 11
  • 12.
    Schema.xml • Examplefor bigram analyzer <fieldType name="text_cjk" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.CJKWidthFilterFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.CJKBigramFilterFactory"/> </analyzer> </fieldType> • Dynamically Reload $curl „http://localhost:8983/solr/admin/cores?action=RELOAD&core=core0‟ [예) $ curl 'http://localhost:8981/solr/admin/cores?action=RELOAD&core=news „] 12
  • 13.
  • 14.
    설정 파일 1. solr 디렉토리에 solr.xml 설정파일 수정 <solr persistent="true" sharedLib="lib"> <cores adminPath="/admin/cores" defaultCoreName=“core1"> <core name=“core1" instanceDir=“core_dir1" /> <core name=“core2" instanceDir=“core_dir2" /> </cores> </solr> 2. solr 디렉토리에 core의 홈 디렉토리 생성 - solr - core_dir1 - core_dir2 3. 생성한 각 디렉토리에 conf와 data 디렉토리를 생성한다.  data 경로는 solrconfig.xml에서 아래와 같은 부분에서 설정할 수 있다. <dataDir>${solr.data.dir:}</dataDir> - solr - core_dir1 - conf - data - core_dir2 - conf - data 14
  • 15.
  • 16.
    Web Admin Interface • Config, Schema, Distribution 정보 조회 • Query Interface • 각종 통계 – Caches: lookups, hits, hitratio, inserts, evictions, size – RequestHandlers: requests, errors – UpdateHandler: adds, deletes, commits, optimizes – IndexReader, open-time, index-version, numDocs, maxDocs • Analysis Debugger – 각 분석 단계에 대한 결과를 보여줌 – 쿼리와 색인에 대한 매치에 대한 정보를 보여줌 16
  • 17.
  • 18.
    XML • Document <add> <delete> <doc> <id>05991</id> <field name="employeeId">05991</field> <id>06000</id> <field name="office">Bridgewater</field> <query>office:Bridgewater</query> <field name="skills">Perl</field> <query>office:Osaka</query> <field name="skills">Java</field> </delete> </doc> </add> • Indexing $ curl http://localhost:8983/solr/update?commit=true -H “Content-Type: text/xml” --data-binary ‘<add><doc><field name="id">testdoc</field></doc></add>’ • Update $ curl http://localhost:8983/solr/update -H “Content-Type: text/xml” --data-binary ‘<add><doc boost="2.5“><field name="employeeId">05991</field> <field name="office" boost="2.0">Bridgewater</field> </doc> </add>’ • Commit $ curl http://localhost:8983/solr/update -H “Content-Type: text/xml” --data-binary ‘<commit waitFlush="false" waitSearcher="false"/>’ 18
  • 19.
    Json • Document [ { "id" : "MyTestDocument", "title" : "This is just a test“ } ] • Indexing $ curl http://localhost:8983/solr/update/json -H 'Content-type:application/json' -d ' [ { "id" : "MyTestDocument", "title" : "This is just a test" } ]' • Update/Delete $ curl http://localhost:8983/solr/update/json -H 'Content-type:application/json' -d '{ "add": {"doc": {"id" : "TestDoc1", "title" : "test1"} }, "add": {"doc": {"id" : "TestDoc2", "title" : "another test“} }, “delete”: {"id" : "TestDoc1“ } }, “delete”: {“query" : “Test“, 'commitWithin':'500' } }, }' • Commit $ curl http://localhost:8983/solr/update?commit=true 19
  • 20.
    CVS • Document [test.cvs] [test.cvs] fieldnames=id,,category fieldnames=id,title,category 100,”title”, ”This Value is "“food“”" 100,”title”, ”This Value is "“food“”" • Indexing $ curl http://localhost:8983/solr/update/csv --data-binary @test.csv -H 'Content-type:text/plain; charset=utf-8' • Example from Mysql Dump $ curl 'http://localhost:8983/solr/update/csv?commit=true&separator=%09&escape=&stream.file=/tmp/result.text‘ 20
  • 21.
  • 22.
    Full-import •테스트 DB 구성 예 Create database solr; Grant alter, select, insert, update, delete on solr.* to solr@localhost identified by „solr‟; Create table maker ( mid int primary key auto_increment, name varchar(30) not null, lastmodified datetime ); Create table product ( id int primary key auto_increment, mid int not null, name varchar(30) not null, hname varchar(30) not null, lastmodified datetime ); Insert into maker(name, lastmodified) values('apple', '2012-05-11 17:00:00'); Insert into maker(name, lastmodified) values('sony', '2012-05-11 17:00:00'); Insert into maker(name, lastmodified) values('microsoft', '2012-05-11 17:00:00'); Insert into product(mid, name, hname, lastmodified) values(1, 'iphone', '아이폰', '2012-05-11 17:00:00'); Insert into product(mid, name, hname, lastmodified) values(1, 'ipod', '아아팟', '2012-05-11 17:00:00'); Insert into product(mid, name, hname, lastmodified) values(1, 'ipad', '아이패드', '2012-05-11 17:00:00'); Insert into product(mid, name, hname, lastmodified) values(2, 'walkman', '워크맨', '2012-05-11 17:00:00'); Insert into product(mid, name, hname, lastmodified) values(2, 'vaio', '바이오', '2012-05-11 17:00:00'); Insert into product(mid, name, hname, lastmodified) values(3, 'windowsxp', '윈도우xp', '2012-05-11 17:00:00'); Insert into product(mid, name, hname, lastmodified) values(3, 'windowx7', '윈도우7', '2012-05-11 17:00:00'); 22
  • 23.
    Full-import •MYSQL Connection 설정 – Solrconfig.xml 파일에서 db 설정 파일을 지정한다. <requestHandler name="/dataimport“ class="org.apache.solr.handler.dataimport.DataImportHandler"> <lst name="defaults"> <str name="config">db-data-config.xml</str> </lst> </requestHandler> – db-data-config.xml 파일에서 데이터에 대한 SQL문을 적용한다. <dataConfig> <dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver” url="jdbc:mysql://localhost/solr" user="solr" password="solr" name="solr"/> <document> <entity name="product" query="select id, mid, name from product"> <field column="id" name="pid" /> <field column="mid" name="mid" /> <field column="name" name="pname" /> <field column=“hname" name=“hname" /> <entity name="maker" query="select mid, name from maker where mid = '${product.mid}'"> <field column="mid" name="mid" /> <field column="name" name="mname" /> </entity> </entity> </document> </dataConfig> 23
  • 24.
    Full-import •색인 설정 – Shema.xml 파일에서 검색 필드를 설정 <field name="pid" type="string" indexed="true" stored="true" required="true" /> <field name="mid" type="int" indexed="true" stored="true" multiValued="false“ /> <field name="pname" type="text" indexed="true" stored="true" multiValued="true“ /> <field name="mname" type="text" indexed="true" stored="true" multiValued="true“ /> …….. <defaultSearchField>pname</defaultSearchField> <defaultSearchField>mname</defaultSearchField> …….. <uniqueKey>pid</uniqueKey> …….. <copyField source="pname" dest="text"/> <copyField source="mname" dest="text"/> – Solr 실행 java -Dsolr.solr.home="./example-DIH/solr/" -jar start.jar – 색인 실행 http://localhost:8983/solr/db/dataimport?command=full-import 24
  • 25.
    Delta-import •테스트 DB 구성 예 Insert into maker(name, lastmodified) values('Samsung', '2012-05-14 14:00:00'); Insert into maker(name, lastmodified) values('LG', '2012-05-14 14:00:00'); Insert into product(mid, name, hname, lastmodified) values(4, 'GalaxyS', '겔럭시S', '2012-05-14 14:00:00'); Insert into product(mid, name, hname, lastmodified) values(4, 'GalaxyA', '겔럭시A', '2012-05-14 14:00:00'); Insert into product(mid, name, hname, lastmodified) values(4, 'GalaxyNote', '겔럭시노트', '2012-05-14 14:00:00'); Insert into product(mid, name, hname, lastmodified) values(5, 'OptimusLTE', '옵티머스LTE', '2012-05-14 14:00:00'); Insert into product(mid, name, hname, lastmodified) values(5, 'VegaLTE', '베가LTE', '2012-05-14 14:00:00'); 25
  • 26.
    Delta-import •MYSQL Connection 설정 – db-data-config.xml 파일에서 데이터에 대한 SQL문을 적용한다. <dataConfig> <dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/solr" user="solr" password="solr" name="solr"/> <document> <entity name="product" pk="id“ query="select * from product“ deltaImportQuery="select * from product where id='${dataimporter.delta.id}'“ deltaQuery="select id from product where lastmodified > '${dataimporter.last_index_time}'"> <field column="id" name="pid" /> <field column="mid" name="mid" /> <field column="name" name="pname" /> <entity name="maker" pk="mid“ query="select mid from maker where mid='${product.mid}'"> <field column="mid" name="mid" /> <field column="name" name="mname" /> </entity> </entity> </document> </dataConfig> – 색인 실행 http://localhost:8983/solr/db/dataimport?command=delta-import 26
  • 27.
  • 28.
    Index • 기존 데이터를 모두 지움 $ java -Durl=http://localhost:$port/solr/update/?commit=true -Ddata=args -jar $dir/post.jar "<delete><query>*:*</query></delete>" • 다음과 같이 post.jar 파일을 이용하여 색인함 $ java -Durl=http://localhost:8983/solr/core1/update/?commit=true -jar post.jar core1_data.xml $ java -Durl=http://localhost:8983/solr/core2/update/?commit=true -jar post.jar core1_data.xml ※ 주의 – 처음 색인 파일 생성시 <doc> <field name="id">id1</field> <field name=“title“>title1</field> </doc> – 색인 파일 갱신시 <update> <doc> <field name="id">id1</field> <field name=“title“>title1</field> </doc> </update> 28
  • 29.
  • 30.
    Search Parameter Parameter Default Description q 검색 쿼리. 예) q=video 혹은 q=title:spiderman^10 text:spiderman start 0 검색된 결과 리스트에 대한 Offset rows 10 반환될 결과 문서 수 반환될 필드 (필드명은 comma로 구분) fl * 예) fl=*,score 혹은 fl=id, name qf 결과로써 제공받을 필드 지정. 예) q=superman&qf=title subject 오름/내림차순으로 검색할 필드 지정 sort 예) sort=inStock asc, price desc 혹은 sort=price asc wt Writer type. 예) wt=json 혹은 wt=xml 필터 쿼리 지정 (결과내 검색 기능) fq 예) q=video&fq=superman hl 하이라이트 필드 지정. 예) hl=true&hl.fl=name, description Faceted Search facet 예) facet=true&facet.field=cat facet.query=price:[0 TO 100]&facet.query=price:[100 TO *] debugQuery 검색결과에 debug 결과를 추가해 보여줌 30
  • 31.
    Query Examples • mission이나 impossible이 포함되고 releaseDate로 내림차순 검색 – q=mission impossible; releaseDate desc • mission을 포함하고actor에 cruise가 포함되지 않은 문서를 검색 – q=+mission –actor:cruise • mission impossible이 붙고, actor에 cruise가 포함되지 않은 문서 검색 – q=“mission impossible” –actor:cruise • title에 spiderman을 description의 spiderman보다 10의 가중치 부여 – q=title:spiderman^10 description:spiderman • description필드에서 spiderman과 movie가 10단어 이내의 문서 검색 – q=description:“spiderman movie”~10 • HDTV를 반드시 포함하고 weight이 40 이상인 문서를 검색 – q=+HDTV +weight:[40 TO *] • Wildcard queries • q=te?t • q=te*t • q=test* 31
  • 32.
  • 33.
  • 34.
  • 35.
    Suggest • 설정 – Solrconfig.xml에 suggest 기능을 추가한다. <searchComponent name="suggest" class="solr.SpellCheckComponent"> <lst name="spellchecker"> <str name="name">suggest</str> <str name="classname">org.apache.solr.spelling.suggest.Suggester</str> <str name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookup</str> <str name="field">name_autocomplete</str> </lst> </searchComponent> <requestHandler name="/suggest" class="org.apache.solr.handler.component.SearchHandler"> <lst name="defaults"> <str name="spellcheck">true</str> <str name="spellcheck.dictionary">suggest</str> <str name="spellcheck.count">10</str> </lst> <arr name="components"> <str>suggest</str> </arr> </requestHandler> 35
  • 36.
    Suggest • 설정 – Shema.xml에 suggest 필드를 추가한다. <fieldType name="text_auto" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1” generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0” splitOnCaseChange="1"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType> <field name="name_autocomplete" type="text_auto" indexed="true" stored="true” multiValued="false" /> <copyField source="name" dest="name_autocomplete" /> • 검색 실행 (http://localhost:8983/solr/db/suggest?spellcheck.build=true) http://localhost:8983/solr/db/suggest?q=겔 http://localhost:8983/solr/db/suggest?q=윈도 36
  • 37.
  • 38.
    동의어 사전 •항목 (synonyms.txt) Window => windowxp window7 window8 window 7, door • 테스트 쿼리 [Query: window 7] 38
  • 39.
    동의어 사전 •테스트 쿼리 [Query: window] • 테스트 쿼리 [Query: door] 39
  • 40.
    불용어 사전 •항목 (stopwords.txt) Window • 테스트 쿼리 [Query: window 7] • 테스트 쿼리 [Query: window] • 테스트 쿼리 [Query: door] 40