DZone Snippets is a public source code repository. Easily build up your personal collection of code snippets, categorize them with tags / keywords, and share them with the world

Snippets has posted 5883 posts at DZone. View Full User Profile

Parse RSS From A Specific Url

04.08.2005
| 23331 views |
  • submit to reddit
        Use this class to parse RSS at a specific URL.

require 'rexml/document'
class ParseRss
	def initialize(url)
		@url = url
	end
	
	def parse
		@content = Net::HTTP.get(URI.parse(@url))
		xml = REXML::Document.new(@content)
		data = {}
		data['title'] = xml.root.elements['channel/title'].text
		data['home_url'] = xml.root.elements['channel/link'].text
		data['rss_url'] = @url
		data['items'] = []
		xml.elements.each('//item') do |item|
			it = {}
			it['title'] = item.elements['title'].text
			it['link'] = item.elements['link'].text
			it['description'] = item.elements['description'].text
			if item.elements['dc:creator']
				it['author'] = item.elements['dc:creator'].text
			end
			if item.elements['dc:date']
				it['publication_date'] = item.elements['dc:date'].text
			elsif item.elements['pubDate']
				it['publication_date'] = item.elements['pubDate'].text
			end
			data['items'] << it
		end
		data
	end
end

Used like so: ParseRss.new('http://someurl.com/rss').parse(). It returns a hash full of nice RSS goodness that you can use as you wish     

Comments

Derek Harmel replied on Wed, 2007/02/21 - 5:22pm

Here's a revision that does things a bit more of the Ruby way (DRY). Also made it a public class method since there's not really any reason to create an instance here. Hash indexes have been converted to symbols and "dc:" stripped out if found. class RSSParser require 'rexml/document' def self.run(url) xml = REXML::Document.new Net::HTTP.get(URI.parse(url)) data = { :title => xml.root.elements['channel/title'].text, :home_url => xml.root.elements['channel/link'].text, :rss_url => url, :items => [] } xml.elements.each '//item' do |item| new_items = {} and item.elements.each do |e| new_items[e.name.gsub(/^dc:(\w)/,"\1").to_sym] = e.text end data[:items] << new_items end data end end

Snippets Manager replied on Fri, 2006/06/02 - 8:45pm

Here it is using symbols instead of strings for the returned data. require 'rexml/document' class ParseRss def initialize( url ) @url = url end def parse @content = Net::HTTP.get( URI.parse( @url ) ) xml = REXML::Document.new( @content ) data = { :title => xml.root.elements['channel/title'].text, :home_url => xml.root.elements['channel/link'].text, :rss_url => @url, :items => [] } xml.elements.each( '//item' ) do |raw_item| item = { :title => raw_item.elements['title'].text, :link => raw_item.elements['link'].text, :description => raw_item.elements['description'].text } if raw_item.elements['dc:creator'] item[:author] = raw_item.elements['dc:creator'].text end if raw_item.elements['dc:date'] item[:publication_date] = raw_item.elements['dc:date'].text elsif raw_item.elements['pubDate'] item[:publication_date] = raw_item.elements['pubDate'].text end data[:items] << item end data end end