こんにちは、tahara です。
いこーよ に Apache Solr の全文検索を使う Sunspot を導入しました。
これとかこれで MySQL の全文検索を使っていましたが、Sunspot を試してみたら
- 速い
- ファセットがものすごく便利
- Kuromoji という日本語形態素解析機が使える
だったので、
というあたりの精神的障壁を克服し、がんばって乗り換えることにしました。
ちょうど Solr 4.0.0 がリリースされたので
- Solr 4.0.0
- sunspot 2.0.0.pre.120925
- sunspot_rails 2.0.0.pre.120925
を使うことにしました。
vi Gemfile
gem "sunspot_rails", "~> 2.0.0.pre.120925" gem "sunspot", "~> 2.0.0.pre.120925"
bundle install
全文検索の対象となるモデルに searchable を書きます。 vi app/model/facility.rb
searchable do text(:name, :boost => 1.5) text(:kana) string(:kana) text(:region_name, :boost => 3) text(:prefecture_name, :boost => 3) text(:address, :boost => 2) text(:body) do "#{pr} #{description} #{tag_list} #{search_keyword} #{features.map(&:name).join(' ')} #{ages.map(&:name).join(' ')}" end string(:tag, :multiple => true) do tag_list end latlon(:location) { Sunspot::Util::Coordinates.new(lat, lng) } integer(:age_ids, :multiple => true) integer(:feature_ids, :multiple => true) integer(:prefecture_id) integer(:region_id) integer(:favorites_count) boolean(:has_picture) do picture_1_file_size.to_i > 0 end boolean(:publish) boolean(:coupon_enabled) float(:rating) do # 口コミがあるものは夜間バッチの評価更新で ratings.created_at <> ratings.updated_at になっている。 if rating.created_at == rating.updated_at 0 else rating.overall_rating end end boost { coupon_enabled? ? 3.0 : 1.0 } time :created_at end def self.default_search_scope(solr, params) params = HashWithIndifferentAccess.new(params) unless HashWithIndifferentAccess === params solr.all_of do if params[:publish].blank? with(:publish, true) else with(:publish, params[:publish]) end with(:age_ids, params[:age_ids]) if params[:age_ids].present? with(:feature_ids, params[:feature_ids]) if params[:feature_ids].present? with(:prefecture_id, params[:prefecture_ids]) if params[:prefecture_ids].present? with(:region_id, params[:region_ids]) if params[:region_ids].present? if params[:tags].present? params[:tags].each do |tag| with(:tag, tag) if tag.present? end end end solr.with(:location).in_radius(params[:lat], params[:lng], params[:distance] || 100, :bbox => true) if params[:lat].present? && params[:lng].present? solr.fulltext(params[:word]) if params[:word].present? end
よく検索するパターンがあるので、 self.default_search_scope にそれをまとめています。
検索を行うコントローラ vi app/controllers/facilities_controller.rb
@facilities = Facility.search(:include => [:ages, :rating, :tags]) do Facility::default_search_scope(self, params) if params[:format] == 'rss' order_by :created_at, :desc elsif params[:lat].present? order_by_geodist(:location, params[:lat], params[:lng]) else if params[:word].present? order_by :score, :desc end order_by :coupon_enabled, :desc order_by :rating, :desc order_by :has_picture, :desc end paginate(:page => params[:page], :per_page => params[:per_page]) facet :region_id if params[:region_ids].blank? && params[:prefecture_ids].blank? facet :prefecture_id if params[:region_ids].present? end
facet によって検索結果に加え都道府県ごとのヒット件数をあわせて取得できます。 この機能はとても便利です。
Rails 側は以上で、次に Solr サイドです。
schema.xml は Sunspot のものにちょっと変更を加えます。
Solr 4.0.0 では _version_
フィールドタイプが必要みたいなので
<field name="_version_" type="long" indexed="true" stored="true"/> <fieldType name="long" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0"/>
また日本語形態素解析機の Kuromoji を使うために text フィールドタイプのアナライザーを変更します。
<fieldType name="text" class="solr.TextField" omitNorms="false" autoGeneratePhraseQueries="true" positionIncrementGap="100" > <analyzer type="index"> <tokenizer class="solr.JapaneseTokenizerFactory" mode="search" userDictionary="lang/userdict_ja.txt"/> <!-- Reduces inflected verbs and adjectives to their base/dictionary forms (辞書形) --> <filter class="solr.JapaneseBaseFormFilterFactory"/> <!-- synonyms --> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> <!-- Removes tokens with certain part-of-speech tags --> <filter class="solr.JapanesePartOfSpeechStopFilterFactory" tags="lang/stoptags_ja.txt" enablePositionIncrements="true"/> <!-- Normalizes full-width romaji to half-width and half-width kana to full-width (Unicode NFKC subset) --> <filter class="solr.CJKWidthFilterFactory"/> <!-- Removes common tokens typically not useful for search, but have a negative effect on ranking --> <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_ja.txt" enablePositionIncrements="true" /> <!-- Normalizes common katakana spelling variations by removing any last long sound character (U+30FC) --> <filter class="solr.JapaneseKatakanaStemFilterFactory" minimumLength="4"/> <!-- Lower-cases romaji characters --> <filter class="solr.LowerCaseFilterFactory"/> <!-- カタカナ → ひらがなに --> <filter class="org.apache.lucene.analysis.icu.ICUTransformFilterFactory" id="Katakana-Hiragana" /> </analyzer> <analyzer type="query"> <tokenizer class="solr.JapaneseTokenizerFactory" mode="search" userDictionary="lang/userdict_ja.txt"/> <!-- Reduces inflected verbs and adjectives to their base/dictionary forms (辞書形) --> <filter class="solr.JapaneseBaseFormFilterFactory"/> <!-- synonyms --> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> <!-- Removes tokens with certain part-of-speech tags --> <filter class="solr.JapanesePartOfSpeechStopFilterFactory" tags="lang/stoptags_ja.txt" enablePositionIncrements="true"/> <!-- Normalizes full-width romaji to half-width and half-width kana to full-width (Unicode NFKC subset) --> <filter class="solr.CJKWidthFilterFactory"/> <!-- Removes common tokens typically not useful for search, but have a negative effect on ranking --> <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_ja.txt" enablePositionIncrements="true" /> <!-- Normalizes common katakana spelling variations by removing any last long sound character (U+30FC) --> <filter class="solr.JapaneseKatakanaStemFilterFactory" minimumLength="4"/> <!-- Lower-cases romaji characters --> <filter class="solr.LowerCaseFilterFactory"/> <!-- カタカナ → ひらがなに --> <filter class="org.apache.lucene.analysis.icu.ICUTransformFilterFactory" id="Katakana-Hiragana" /> </analyzer> </fieldType>
Solr を起動する init スクリプトも必要ですね。
#! /bin/sh ### BEGIN INIT INFO # Provides: solr # Required-Start: $remote_fs $syslog # Required-Stop: $remote_fs $syslog # Default-Start: 2 3 4 5 # Default-Stop: 0 1 6 # Short-Description: Apache Solr # Description: Apache Solr # sudo ln -s /var/www/outing/current/solr/etc/init.d/init.sh solr # sudo update-rc.d solr defaults ### END INIT INFO # Author: antindi <dev@actindi.net> # # Please remove the "Author" lines above and replace them # with your own name if you copy and modify this script. # Do NOT "set -e" # PATH should only include /usr/* if it runs after the mountnfs.sh script PATH=/sbin:/usr/sbin:/bin:/usr/bin DESC="Solr" NAME=solr PROCESS_NAME=java SOLR_HOME=/var/www/outing/current/solr DAEMON=/usr/bin/java DAEMON_ARGS="-Xmx1024m -Djava.util.logging.config.file=etc/logging.properties -jar start.jar" PIDFILE=/var/run/$NAME/$NAME.pid LOG_DIR=/var/log/$NAME BASE_DIR=/var/lib/$NAME DATA_DIR=$BASE_DIR/data SCRIPTNAME=/etc/init.d/$NAME SOLR_USER=deployer # Exit if the package is not installed [ -x "$DAEMON" ] || exit 0 # Read configuration variable file if it is present [ -r /etc/default/$NAME ] && . /etc/default/$NAME # Load the VERBOSE setting and other rcS variables . /lib/init/vars.sh # Define LSB log_* functions. # Depend on lsb-base (>= 3.2-14) to ensure that this file is present # and status_of_proc is working. . /lib/lsb/init-functions # # Function that starts the daemon/service # do_start() { mkdir `dirname $PIDFILE` > /dev/null 2>&1 || true chown $SOLR_USER `dirname $PIDFILE` mkdir $LOG_DIR > /dev/null 2>&1 || true chown $SOLR_USER $LOG_DIR mkdir -p $DATA_DIR > /dev/null 2>&1 || true chown $SOLR_USER $DATA_DIR # Return # 0 if daemon has been started # 1 if daemon was already running # 2 if daemon could not be started start-stop-daemon -b -m -c $SOLR_USER -d $SOLR_HOME --start --quiet --pidfile $PIDFILE --exec $DAEMON --test > /dev/null \ || return 1 start-stop-daemon -b -m -c $SOLR_USER -d $SOLR_HOME --start --quiet --pidfile $PIDFILE --exec $DAEMON -- \ $DAEMON_ARGS \ || return 2 # Add code here, if necessary, that waits for the process to be ready # to handle requests from services started subsequently which depend # on this one. As a last resort, sleep for some time. } # # Function that stops the daemon/service # do_stop() { # Return # 0 if daemon has been stopped # 1 if daemon was already stopped # 2 if daemon could not be stopped # other if a failure occurred start-stop-daemon --stop --quiet --retry=TERM/30/KILL/5 --pidfile $PIDFILE --name $PROCESS_NAME RETVAL="$?" [ "$RETVAL" = 2 ] && return 2 # Wait for children to finish too if this is a daemon that forks # and if the daemon is only ever run from this initscript. # If the above conditions are not satisfied then add some other code # that waits for the process to drop all resources that could be # needed by services started subsequently. A last resort is to # sleep for some time. #start-stop-daemon --stop --quiet --oknodo --retry=0/30/KILL/5 --exec $DAEMON #[ "$?" = 2 ] && return 2 # Many daemons don't delete their pidfiles when they exit. rm -f $PIDFILE return "$RETVAL" } # # Function that sends a SIGHUP to the daemon/service # do_reload() { # # If the daemon can reload its configuration without # restarting (for example, when it is sent a SIGHUP), # then implement that here. # start-stop-daemon --stop --signal 1 --quiet --pidfile $PIDFILE --name $PROCESS_NAME return 0 } case "$1" in start) [ "$VERBOSE" != no ] && log_daemon_msg "Starting $DESC" "$NAME" do_start case "$?" in 0|1) [ "$VERBOSE" != no ] && log_end_msg 0 ;; 2) [ "$VERBOSE" != no ] && log_end_msg 1 ;; esac ;; stop) [ "$VERBOSE" != no ] && log_daemon_msg "Stopping $DESC" "$NAME" do_stop case "$?" in 0|1) [ "$VERBOSE" != no ] && log_end_msg 0 ;; 2) [ "$VERBOSE" != no ] && log_end_msg 1 ;; esac ;; status) status_of_proc "$DAEMON" "$NAME" && exit 0 || exit $? ;; #reload|force-reload) # # If do_reload() is not implemented then leave this commented out # and leave 'force-reload' as an alias for 'restart'. # #log_daemon_msg "Reloading $DESC" "$NAME" #do_reload #log_end_msg $? #;; restart|force-reload) # # If the "reload" option is implemented then remove the # 'force-reload' alias # log_daemon_msg "Restarting $DESC" "$NAME" do_stop case "$?" in 0|1) do_start case "$?" in 0) log_end_msg 0 ;; 1) log_end_msg 1 ;; # Old process is still running *) log_end_msg 1 ;; # Failed to start esac ;; *) # Failed to stop log_end_msg 1 ;; esac ;; *) #echo "Usage: $SCRIPTNAME {start|stop|restart|reload|force-reload}" >&2 echo "Usage: $SCRIPTNAME {start|stop|status|restart|force-reload}" >&2 exit 3 ;; esac :
sudo update-rc.d solr defaults sudo service solr
今朝本番投入しましが、ちゃんと動いてくれているようです。よかった。
弊社ではエンジニア募集しています。お気軽にお問い合わせください。