Big Data/BI Zone is brought to you in partnership with:
  • submit to reddit
Ravi Kalakota05/25/13
765 views
0 replies

Data Monetization is the End Goal

The billion dollar question facing executives everywhere: How do I monetize my data? What small data or big data monetization strategies should I adopt? Which analytical investments and strategies really increase revenue? What pilots should I run to test data monetization ideas out?

Jayadevan Maymala05/25/13
192 views
0 replies

Instant PostgreSQL Starter

Get started with PostgreSQL in about 30 pages.

Trisha Gee05/24/13
1706 views
0 replies

Be an Ambassador for Programming

You know how I keep banging on about attracting different types of people into programming? You know how we say we need to get them young?

Gary Sieling05/24/13
2624 views
0 replies

My First A/B Test

A/B testing gets a lot of attention on Hacker News, inbound.org, and other forums, and appeals to me as a data analysis exercise. As a software engineer with a practical bent, I like the concept of data analysis techniques which produce useful results while treating a system as a black box.

Steven Lott05/24/13
1603 views
0 replies

Legacy Code Preservation: Language Incompatibility and Technology Evolution

It's important to address language or platform incompatibility as consequences of technology modernization. The reason why we have to do manual conversions of software is because of the language incompatibility issue. We must convert manually when no tool can do the conversion.

Kay Cichini05/24/13
1262 views
0 replies

R Quick Tip: Shutdown Windows after Script Has Finished

Quite often I have long procedures running and want to do this over night. However, my computer would still be running all night after the script has finished.

Eric Gregory05/23/13
1723 views
0 replies

Building Human Fault-Tolerant Systems

In this really excellent talk from Strata 2013, Twitter's Nathan Marz walks through the challenges and serious rewards of building systems that are resilient even in the face of human error...

John Cook05/23/13
1402 views
0 replies

The Rise and Fall of Binomial Coefficients

When you expand (x + y)n, the coefficients increase then decrease. The largest coefficient is in the middle if n is even; it’s the two in the middle if n is odd. For example, the coefficients for (1 +x)4 are 1, 4, 6, 4, 1 and the coefficients for (1 + x)5 are 1, 5, 10, 10, 5, 1.

Brian O' Neill05/23/13
1274 views
0 replies

Big Data Overview and Cassandra Plunge

Brian O'Neill gives us a dive into the "Big Data Quadfecta" from Philly JUG.

Col Wilson05/23/13
123 views
0 replies

Python _imaging Cannot Open Shared Object File

I'm using a 64 bit Linux, and I was having a lot of trouble getting Calibre to convert books into the Kindle-loving mobi format...

John Cook05/22/13
1176 views
0 replies

Need a 12-Digit Prime?

You may have seen the joke “Enter any 12-digit prime number to continue.” I’ve seen it floating around as the punchline in several contexts.

Nick Johnson05/22/13
4705 views
1 replies

Algorithm of the Week: Damn Cool Cardinality Estimation

Suppose you have a very large dataset - far too large to hold in memory - with duplicate entries. You want to know how many duplicate entries, but your data isn't sorted, and it's big enough that sorting and counting is impractical. How do you estimate how many unique entries the dataset contains?

Rafał Kuć05/22/13
2451 views
0 replies

Solr 4.2: Index Structure Reading API

With the release of Solr 4.2 we’ve got the possibility to use the HTTP protocol to get information about Solr index structure. Let's look at the new API by example.

Mikio Braun05/22/13
1672 views
0 replies

A Visit to the Valley

It’s one thing to know abstractly that the Silicon Valley is home to most computer related companies, and to drive down Highway 101 and see another well known company every 30 seconds or so.

Eric Gregory05/21/13
2097 views
0 replies

Data Science at LinkedIn

Data scientist Monica Rogati discusses data scaling at LinkedIn and reflects on the evolving role of the data scientist.

Doug Turnbull05/21/13
5466 views
0 replies

How Does a Search Engine Work? An Educational Trek Through A Lucene Postings Format

A new feature of Lucene 4 – pluggable codecs – allows for the modification of Lucene’s underlying storage engine. Working with codecs and examining their output yields fascinating insights into how exactly Lucene’s search works in its most fundamental form.

Christopher Taylor05/21/13
1907 views
0 replies

"Say Goodbye to Anonymity"

As CBS showed, Intel is already selling software to recognize the rough demographic of an individual in order to deliver more a targeted advertisement, and as the report stated, “Big Brother is no longer Big Government. Big Brother is Big Business,” but without the rules that restrict government.

Arthur Charpentier05/21/13
1184 views
0 replies

Quantifying Scientific Consensus, Zombies in R, and More Data Links

Arthur Charpentier's regular data link roundup explores quantified consensus on anthropogenic global warming, compares SAS and R for business analysts, and much more. Plus: zombies (with R).

Arthur Charpentier05/20/13
2211 views
0 replies

The Many Mathematical Models of the Shuffle

Now, I have to confess that I have been surprised, while I was looking for mathematical models for shuffling, to find so many deterministic techniques (and results related to algebra, and cycles).

Raymond Camden05/20/13
1369 views
0 replies

Converting a list of dates into a shorter, combined list

Given a list of dates, how would you rewrite them so that two (or more) consecutive dates are displayed together?

Alex Holmes05/20/13
1387 views
0 replies

Bucketing, Multiplexing and Combining in Hadoop - Part 1

This is the first blog post in a series which looks at some data organization patterns in MapReduce. We’ll look at how to bucket output across multiple files in a single task, how to multiplex data across multiple files, and also how to coalesce data. These are all common patterns that are useful to have in your MapReduce toolkit.

John Cook05/20/13
157 views
0 replies

Prove or Disprove

"When we’re faced with a 'prove or disprove,' we’re usually better off trying first to disprove with a counterexample, for two reasons..."

Tharindu Mathew05/19/13
1421 views
1 replies

Moving Education Toward Open Source in Sri Lanka

Recently, I read this post about Richard Stallman’s (RMS) visit to India. And read more about RMS’s visit impacted the country.

George London05/19/13
1822 views
0 replies

Postgres Fuzzy Search Using Trigrams (+/- Django)

We need a way to match queries to entities in our Postgres database. At first, this might seem like a simple problem with a simple solution, especially if you’re using the ORM; just jam the user input into an ORM filter and retrieve every matching string. But there’s a problem.

John Cook05/18/13
2164 views
0 replies

Extreme Syntax

Lisp practically has no syntax. It simply has parenthesized expressions. This makes it very easy to start using the language. And above all, it makes it easy to treat code as data. Lisp macros are very powerful, and these macros are made possible by the fact that the language is simple to parse.