By acoin
via java.dzone.com
Published: Jan 18 2013 / 10:47
Last week I wrote a little script in node.js. Its goal? GET ALL THE DATA! The plan was to scrape a massive dataset off Github and do some analysis of programmers’ working habits. The scraping job took the bigger part of a week. On Saturday morning I had a mongo database with a list of 513,900 repositories on a small EC2 instance. They were not guaranteed to be unique.
Add your comment