Big Data/BI Zone is brought to you in partnership with:
  • submit to reddit
Paul Miller05/05/13
2280 views
0 replies

Find the Data, Aggregate the Data, Make the Data Useful

Enigma pulls data from tens of thousands of public data sets, and then offers up an interface that makes it pretty straightforward to trawl through the whole lot in search of the data points that you actually need. As the company’s Marc DaCosta introduced it, a “search and discovery platform for public data.”

Joe Stein05/05/13
2324 views
0 replies

Using Scala To Work With Hadoop

Cloudera has a great toolkit to work with Hadoop.  Specifically it is focused on building distributed systems and services on top of the Hadoop Ecosystem.

Steven Lott05/04/13
2550 views
0 replies

Legacy Code Preservation: Data Warehouse and Legacy Operations

A data warehouse preserves data. It can be argued that a data warehouse preserves only data. This, however, is false. To an extent, a data warehouse must also preserve processing details.

Giuseppe Vettigli05/04/13
595 views
0 replies

A new RefCard from the GlowingPython!

This Refcard is a collection of code examples that introduces the reader to the principal Data Mining tasks using Python.

Eric Gregory05/03/13
2129 views
0 replies

Big Data and the Human Right to Education

Day to day, it's easy to lose sight of what it means to live in the future we've made, to take it for granted.

Christopher Taylor05/03/13
4260 views
0 replies

Sorry, You Can't Simply Hire a Data Scientist

Data scientists are in short supply! Or at least that’s a headline you can find nearly everywhere. There are people trying desperately to hire them and also people trying hard to jump into the perceived gap and become one. Meanwhile, there’s plenty of skepticism over whether the role is real or a function of all of the hype.

Sam Taha05/03/13
2398 views
1 replies

BigQuery: Data Warehouse in the Clouds

There are a lot of changes occurring these days with the Big Data revolution such as cloud computing, NoSQL, Columnar stores, and virtualization just to mention a few of the fast moving technologies that are transforming how we manage our data and run our IT operations.

Ravi Kalakota05/03/13
1567 views
0 replies

Big Data Needs Storytellers: An Interview with Gary Vaynerchuck

"Big data companies are terrible at communicating their value proposition. It needs good storytellers and marketers who can talk about its business value."

Vince Sesto05/02/13
2757 views
0 replies

Implementing Splunk

I was excited to hear the news that Packt Publishing(http://www.packtpub.com/) were releasing a new book dedicated to Splunk called “Implementing Splunk - Big Data Reporting and Development for Operation Intelligence”(http://www.packtpub.com/implementing-splunk/book). A majority of the documentation and information on Splunk has been produced by Splunk so I was eager to see if “Implementing Splunk” was going to be a fresh take on the large amount of information that is currently out there. “Implementing Splunk” was written by Vincent Bumgarner who has been designing software for close to 20 years and has been working with Splunk from 2007, and has been helping companies use the application as a Business Intelligence, Reporting and Analytics Tool.

Eric Gregory05/02/13
1974 views
0 replies

Big Data and the Xbox

At Strata 2013, Microsoft's Dave Campbell talks about how the Xbox leverages big data...

John Cook05/02/13
1669 views
0 replies

Recognizing Special Numbers with nsimplify

I was playing around with SymPy, a symbolic math package for Python, and ran across nsimplify. It takes a floating point number and tries to simplify it: as a fraction with a small denominator, square root of a small integer, an expression involving famous constants, etc.

Arthur Charpentier05/02/13
2952 views
0 replies

Financial Model Complexity

Today, Olivier Scaillet gave a great talk on fast recursive projections. After lunch, we discussed financial model complexity, mentioning that sometimes, traders and quants are lost, and it might be good to spend more time on basics than on very advanced stuff.

Christopher Taylor05/01/13
2080 views
0 replies

Big Data is Everything and Nothing (Depending on Who You Ask)

It seems big data means something different to everyone. In the great debate/hype about big data, there’s no lack of opinion on the topic and it seems to mostly depend on an individual’s product, skill set and business challenges.

John Cook05/01/13
1155 views
0 replies

More Sides or More Dice?

My previous post looked at rolling 5 six-sided dice as an approximation of a normal distribution. If you wanted a better approximation, you could roll dice with more sides, or you could roll more dice. Which helps more?

Arthur Charpentier05/01/13
1176 views
0 replies

Advanced Methods in Trees

I gave a talk recently at the Mathematical Finance Days, organized in HEC Montréal Monday and Tuesday, on Advanced methods in trees with (as mentioned in the subtitle of the first slide) a some thoughts on teaching mathematical finance.

Eric Gregory05/01/13
1282 views
0 replies

Hadoop: Data Operating System of the Future?

Standford's Amr Awadallah argues that Hadoop is the "data operating system of the future."

John Cook04/30/13
2592 views
0 replies

Rolling Dice for Normal Samples in Python

A handful of dice can make a decent normal random number generator, good enough for classroom demonstrations. This Python code calculates the normal distribution of the sum of the dice.

Chase Seibert04/30/13
2335 views
0 replies

HBase Schema Introduction for Programmers

Schema design in NoSQL is very different from schema design in a RDBMS. Once you get something like HBase up and running, you may find yourself staring blankly at a shell, lost in the possibilities of creating your first table.

Paul Hammant04/30/13
4087 views
0 replies

Open Data Backed by Source Control

Source-control backing is a decade-long obsession of mine, and now I'm thinking about “open data.” If something can be represented by a textual document, is structured or regular, and tends towards completeness over time, then source-control is a viable alternative to a relational schema (or a document store).

Eric Gregory04/30/13
687 views
0 replies

Pulling Data From Pages That Don't Expect It

In this seriously in-depth Pycon talk, we learn how to use Python to scrape data from web sources not conventionally built to supply it.

John Berryman04/29/13
1578 views
0 replies

Understanding Solr Soft Commits And Data Durability

I ran into an interesting problem today. I was working with the first project where we legitimately needed Solr soft commits and in testing my configuration I wanted to prove to myself that the soft commits were performing as expected.

Paul Miller04/29/13
2293 views
0 replies

Visualisation – the key that unlocks data’s value?

As the Big Data hype machine continues its relentless attempt to gobble everything in its path, new business units and entire new domains buying into the promise find themselves faced with unanticipated data volume and complexity.

Eric Gregory04/29/13
639 views
0 replies

LinkedIn TechTalk: Machine Learning Basics

Via LinkedIn TechTalks, Rob Bekkerman delves into the basics of machine learning.

Chanwit Kaewkasi04/29/13
5123 views
0 replies

Develop a MongoDB Application with ZK & Grails

This article shows how to develop a MongoDB application quickly with ZK & Grails.

Mats Lindh04/28/13
1642 views
0 replies

SolrException: can not sort on unindexed field: geodist()

This error may occur if you’re using sort=geodist() in your Solr Spatial / Geographic Search. The reason is probably that you have an empty pt= value or that the parameter is missing all together.