Paul Miller05/05/13
2280 views
0 replies
Enigma pulls data from tens of thousands of public data sets, and then offers up an interface that makes it pretty straightforward to trawl through the whole lot in search of the data points that you actually need. As the company’s Marc DaCosta introduced it, a “search and discovery platform for public data.”
Joe Stein05/05/13
2324 views
0 replies
Cloudera has a great toolkit to work with Hadoop. Specifically it is focused on building distributed systems and services on top of the Hadoop Ecosystem.
Steven Lott05/04/13
2550 views
0 replies
A data warehouse preserves data. It can be argued that a data warehouse preserves only data. This, however, is false. To an extent, a data warehouse must also preserve processing details.
Giuseppe Vettigli05/04/13
595 views
0 replies
This Refcard is a collection of code examples that introduces the reader to the principal Data Mining tasks using Python.
Eric Gregory05/03/13
2129 views
0 replies
Day to day, it's easy to lose sight of what it means to live in the future we've made, to take it for granted.
Christopher Taylor05/03/13
4260 views
0 replies
Data scientists are in short supply! Or at least that’s a headline you can find nearly everywhere. There are people trying desperately to hire them and also people trying hard to jump into the perceived gap and become one. Meanwhile, there’s plenty of skepticism over whether the role is real or a function of all of the hype.
Sam Taha05/03/13
2398 views
1 replies
There are a lot of changes occurring these days with the Big Data revolution such as cloud computing, NoSQL, Columnar stores, and virtualization just to mention a few of the fast moving technologies that are transforming how we manage our data and run our IT operations.
Ravi Kalakota05/03/13
1567 views
0 replies
"Big data companies are terrible at communicating their value proposition. It needs good storytellers and marketers who can talk about its business value."
Vince Sesto05/02/13
2757 views
0 replies
I was excited to hear the news that Packt Publishing(http://www.packtpub.com/) were releasing a new book dedicated to Splunk called “Implementing Splunk - Big Data Reporting and Development for Operation Intelligence”(http://www.packtpub.com/implementing-splunk/book). A majority of the documentation and information on Splunk has been produced by Splunk so I was eager to see if “Implementing Splunk” was going to be a fresh take on the large amount of information that is currently out there. “Implementing Splunk” was written by Vincent Bumgarner who has been designing software for close to 20 years and has been working with Splunk from 2007, and has been helping companies use the application as a Business Intelligence, Reporting and Analytics Tool.
Eric Gregory05/02/13
1974 views
0 replies
At Strata 2013, Microsoft's Dave Campbell talks about how the Xbox leverages big data...
John Cook05/02/13
1669 views
0 replies
I was playing around with SymPy, a symbolic math package for Python, and ran across nsimplify. It takes a floating point number and tries to simplify it: as a fraction with a small denominator, square root of a small integer, an expression involving famous constants, etc.
Arthur Charpentier05/02/13
2952 views
0 replies
Today, Olivier Scaillet gave a great talk on fast recursive projections. After lunch, we discussed financial model complexity, mentioning that sometimes, traders and quants are lost, and it might be good to spend more time on basics than on very advanced stuff.
Christopher Taylor05/01/13
2080 views
0 replies
It seems big data means something different to everyone. In the great debate/hype about big data, there’s no lack of opinion on the topic and it seems to mostly depend on an individual’s product, skill set and business challenges.
John Cook05/01/13
1155 views
0 replies
My previous post looked at rolling 5 six-sided dice as an approximation of a normal distribution. If you wanted a better approximation, you could roll dice with more sides, or you could roll more dice. Which helps more?
Arthur Charpentier05/01/13
1176 views
0 replies
I gave a talk recently at the Mathematical Finance Days, organized in HEC Montréal Monday and Tuesday, on Advanced methods in trees with (as mentioned in the subtitle of the first slide) a some thoughts on teaching mathematical finance.
Eric Gregory05/01/13
1282 views
0 replies
Standford's Amr Awadallah argues that Hadoop is the "data operating system of the future."
John Cook04/30/13
2592 views
0 replies
A handful of dice can make a decent normal random number generator, good enough for classroom demonstrations. This Python code calculates the normal distribution of the sum of the dice.
Chase Seibert04/30/13
2335 views
0 replies
Schema design in NoSQL is very different from schema design in a RDBMS. Once you get something like HBase up and running, you may find yourself staring blankly at a shell, lost in the possibilities of creating your first table.
Paul Hammant04/30/13
4087 views
0 replies
Source-control backing is a decade-long obsession of mine, and now I'm thinking about “open data.” If something can be represented by a textual document, is structured or regular, and tends towards completeness over time, then source-control is a viable alternative to a relational schema (or a document store).
Eric Gregory04/30/13
687 views
0 replies
In this seriously in-depth Pycon talk, we learn how to use Python to scrape data from web sources not conventionally built to supply it.
John Berryman04/29/13
1578 views
0 replies
I ran into an interesting problem today. I was working with the first project where we legitimately needed Solr soft commits and in testing my configuration I wanted to prove to myself that the soft commits were performing as expected.
Paul Miller04/29/13
2293 views
0 replies
As the Big Data hype machine continues its relentless attempt to gobble everything in its path, new business units and entire new domains buying into the promise find themselves faced with unanticipated data volume and complexity.
Eric Gregory04/29/13
639 views
0 replies
Via LinkedIn TechTalks, Rob Bekkerman delves into the basics of machine learning.
Chanwit Kaewkasi04/29/13
5123 views
0 replies
This article shows how to develop a MongoDB application quickly with ZK & Grails.
Mats Lindh04/28/13
1642 views
0 replies
This error may occur if you’re using sort=geodist() in your Solr Spatial / Geographic Search. The reason is probably that you have an empty pt= value or that the parameter is missing all together.