This past weekend I presented yet another “Rapid Prototyping with Solr” presentation, this time back in the saddle with the No Fluff, Just Stuff symposium in Raleigh, NC. I intentionally waited until the last minute to hack together a quick script to index some data I haven’t indexed before to demonstrate the ease at which one can grab Solr and immediately make some use out of it. This time around I cobbled together a simple Ruby script to index a directory full of rich (PDF, HTML, Word, etc) documents into a fresh Solr 3.3.0 install. Only a few seconds later I have my documents indexed, and even searchable through a user interface.

Here’s the steps I took:

  1. Download and “install” (aka unzip) Apache Solr 3.3.0
  2. Launch Solr (cd example; java -jar start.jar)
  3. Index files

That’s it.  Here’s the indexing script I used:

require 'net/http'

@dir = Dir.new("/Users/erikhatcher/apache-solr-3.3.0/docs")

@url = URI.parse("http://localhost:8983/solr")
@connection = Net::HTTP.new(@url.host, @url.port)

def index(filename)
@connection.get(@url.path + "/update/extract?stream.file=#{filename}&literal.id=#{filename}")
end

def commit
@connection.get(@url.path + "/update?commit=true")
end

@dir.each {|name|
  f = "#{@dir.path}/#{name}"
  if File.file?(f)
    puts "Indexing #{f}..."
    index(f)
  end
}

puts "Committing..."
commit

puts "Done!"


To make it look prettier, only a little dabbling with the templates is needed – add your company logo, customize the colors. And a change to the example (/browse handler) configuration to facet on content_type will allow you to easily search just within documents of specific types through the included UI.  The example code above indexed the docs that ship with Apache Solr 3.3.0; just change the path to a directory of yours to index your own content.

About Erik Hatcher

Read more from this author

LEARN MORE

Contact us today to learn how Lucidworks can help your team create powerful search and discovery applications for your customers and employees.