DZone Snippets is a public source code repository. Easily build up your personal collection of code snippets, categorize them with tags / keywords, and share them with the world

Jerry has posted 1 posts at DZone. View Full User Profile

Compete.com Webstats Scrape Groovy

01.16.2012
| 3621 views |
  • submit to reddit
        // description of your code here
This is a script for collecting webstats data from compete.com. The scripts takes as input the list of domains that you want to analyze and outputs the compete.com webstats data.
import com.gargoylesoftware.htmlunit.WebClient
import com.gargoylesoftware.htmlunit.BrowserVersion

def domainList = (new File("/root/Desktop/Morningstar/AlexaTop3000.txt")).readLines()
def outFile = new File("/root/Desktop/Morningstar/CompeteStats3000.csv")
outFile.delete()
def wc = new WebClient( BrowserVersion.FIREFOX_3_6 )

domainList.each {
  def domainName = it.trim()
  println domainName
  def url = "http://siteanalytics.compete.com/export_csv/${domainName}/"
  def page = wc.getPage( url )
  def pageLines = page.getContent().split("\n")

  def lineCount = 0
  pageLines.each { line ->
   if ( lineCount > 3 ) {
     outFile.append( "\"${domainName}\",${line}\n" )
   }
   lineCount++
  }
  sleep( 400 )
}