How to build a mail search engine using Nitro

By Kashia.

Part 5 - Importing mails into the database

We now have a model, which we can use to put stuff into our database. We just have to make a script, to facilitate that process.

At the end of app/mail/mail_utils.rb:

28 if __FILE__ == $0
29   require 'rubygems'
30   require 'og'
31   require 'model_mail.rb'
32   require 'facet/date/days_in_month'
33   
34   Og.create_schema = true
35   Og.setup(
36     :destroy => false,
37     :evolve_schema => true,
38     :store => :psql,
39     :name => 'mymails',
40     :user => 'johndoe',
41     :password => '',
42     :connection_count => 2
43     )
44   
45   url = 'http://rubyforge.org/pipermail/nitro-general/%s.txt'
46   date = Date.parse('2005-01-01') # must be the first of a month
47   
48   while date < Date.today
49     turl = url % date.strftime("%Y-%B")
50     
51     mp = ModEMailParser.parse(turl)
52     
53     begin
54       mails.each do |mail|
55         ma = IndexedMail.new(mail)
56         ma.save
57       end
58     rescue PGError => e
59       # ignore errors. those will be duplicated Message-IDs, meaning same 
60       # message won't get parsed twice, or the mail provider was too dumb to
61       # provide unique IDs.
62     end
63     
64     # advances a month
65     date += date.days_in_month
66     
67   end
68   
69 end

Because we're lazy (well, I am..), we want to parse all the emails of the mailing list at once, so we create a loop from the first month up until today (line 48). We use the url and the date to craft the URL we want to parse. Nice little screen-scraper, ain't it?

Next step is to parse the email (line 51) and loop over the mails (line 54). In line 55 and 56, the most important thing happens. A new mail is created, and then saved into the database.

After chmod +x app/mail/mail_utils.rb now, you have a nice application, good to go for the Nitro mailing list. If you haven't run it yet, in excitement, do it now :D Now you got like... thousands of mails in your database, isn't that cool :D

first
last