We now have a model, which we can use to put stuff into our database. We just have to make a script, to facilitate that process.
app/mail/mail_utils.rb:28 if __FILE__ == $0 29 require 'rubygems' 30 require 'og' 31 require 'model_mail.rb' 32 require 'facet/date/days_in_month' 33 34 Og.create_schema = true 35 Og.setup( 36 :destroy => false, 37 :evolve_schema => true, 38 :store => :psql, 39 :name => 'mymails', 40 :user => 'johndoe', 41 :password => '', 42 :connection_count => 2 43 ) 44 45 url = 'http://rubyforge.org/pipermail/nitro-general/%s.txt' 46 date = Date.parse('2005-01-01') # must be the first of a month 47 48 while date < Date.today 49 turl = url % date.strftime("%Y-%B") 50 51 mp = ModEMailParser.parse(turl) 52 53 begin 54 mails.each do |mail| 55 ma = IndexedMail.new(mail) 56 ma.save 57 end 58 rescue PGError => e 59 # ignore errors. those will be duplicated Message-IDs, meaning same 60 # message won't get parsed twice, or the mail provider was too dumb to 61 # provide unique IDs. 62 end 63 64 # advances a month 65 date += date.days_in_month 66 67 end 68 69 end
Because we're lazy (well, I am..), we want to parse all the emails of the mailing
list at once, so we create a loop from the first month up until today (line 48).
We use the url and the date to craft the URL we want to parse. Nice
little screen-scraper, ain't it?
Next step is to parse the email (line 51) and loop over the mails (line 54). In line 55 and 56, the most important thing happens. A new mail is created, and then saved into the database.
After chmod +x app/mail/mail_utils.rb now, you have a nice application, good
to go for the Nitro mailing list.
If you haven't run it yet, in excitement, do it now :D
Now you got like... thousands of mails in your database, isn't that cool :D