Big Data/BI Zone is brought to you in partnership with:
  • submit to reddit
Eric Gregory04/22/13
861 views
0 replies

Cloud Deployments: Using Hadoop on Clouds

Packt Publishing has provided Chapter 10 of their forthcoming Hadoop MapReduce Cookbook for DZone Readers, covering Hadoop and Amazon ElasticMapReduce.

Erich Styger04/22/13
523 views
0 replies

Why I don’t like printf()

I have a strong opinion, and a rule for using printf(): don't use it.

Eric Gregory04/22/13
616 views
0 replies

Actuarial Analytics with R

Jim Guszcza from the Wisconsin School of Business leads this tutorial on Actuarial analytics with R.

Eric Gregory04/21/13
1977 views
0 replies

Statistical Aspects of Data Mining with R

David Mease's "Statistical Aspects of Data Mining" course, taught a few years back at both Stanford and Google, is a great introduction to data mining and R.

Arthur Charpentier04/21/13
1193 views
0 replies

Data News: "Reverse Causality," the Online Population, and More

In this data link roundup from Arthur Charpentier, there's more on Reinhart-Rogoff, a look at "reverse-causality," plus: what percentage of the world population is actually online?

Christopher Taylor04/20/13
2398 views
0 replies

Weighing Privacy in the Age of Ubiquitous Data

The speed with which the Boston Marathon bombing suspects were identified was a remarkable sign that we’re in the age of ubiquitous photos and video of the public square, albeit at a major international event.

Arthur Charpentier04/20/13
1303 views
0 replies

Data News: Reinhart-Rogoff, Rule by Algorithm, and More

Lots of data news lately: Arthur Charpentier's roundup covers Reinhart-Rogoff, Kaggle, what algorithms tell us about the language of news, and much more.

Eric Gregory04/19/13
4361 views
0 replies

Links You Don't Want To Miss (Apr. 19)

Today: Mozilla's pluggable collaboration tool, CISPA, homemade drones, a radical new CSS best practice, and Code Monkey Saves World.

John Cook04/19/13
2200 views
0 replies

Moments of Mixtures in Python

I needed to compute the higher moments of a mixture distribution for a project I’m working on. I’m writing up the code here in case anyone else finds this useful. (And in case I’ll find it useful in the future.)

Eric Gregory04/19/13
1281 views
0 replies

8 Predictive Analytics Questions Answered by the Guy Who Wrote the Book

Eric Siegal, author of the recent Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die, takes on eight Big Questions.

Bootstrap Mark...04/19/13
590 views
0 replies

Will Hadoop Replace the Data Warehouse?

In the early days of data warehousing, there was a raging debate between two architectural approaches. There was a camp that advocated Ralph Kimball’s federated data mart architecture, and a camp that advocated Bill Inmon’s enterprise data warehouse architecture.

Doug Turnbull04/18/13
2397 views
0 replies

Querying More Fields != More Results

Let’s take our new knowledge for a test drive with this puzzler: Why would adding a field to qf cause our result set to actually shrink in size?

Mike Driscoll04/18/13
3449 views
0 replies

Python Gets Funded by DARPA for Big Data Project: Blaze

I first heard about Blaze from NumPy’s original developer’s blog back in December 2012. Recently InformationWeek announced that DARPA was funding the project to the tune of $3 million dollars.

Christopher Taylor04/18/13
1381 views
0 replies

Structure No Longer Has a Say on What's Data or Not

Not so long ago, businesses didn’t care about information outside the normal structure of trusted outlets like print media, trade journals, academic research and other trusted system-generated information.

Swathi Venkatachala04/18/13
382 views
0 replies

Monitoring S3 Uploads for a Real Time Data

If you are working on Big Data and its bleeding edge technologies like Hadoop etc., the primary thing you need is a "dataset" to work on.

Eric Gregory04/17/13
6211 views
0 replies

Dev of the Week: A. Jesse Jiryu Davis

This week we're talking to A. Jesse Jiryu Davis, a developer at 10gen specializing in MongoDB, Python, Tornado, and Javascript.

Damaris Coll04/17/13
3314 views
0 replies

Detecting Social Capitalists on Twitter with Graph Databases

Nicolas Dugué and Anthony Perez from the University of Orleans introduce new techniques to detect social capitalists on Twitter using Graph Databases.

Tony Russell-rose04/17/13
2127 views
0 replies

Search that Sucks

It isn’t until we encounter flawed design that we are jolted out of our flow and forced to make choices that don’t seem to fit with either our expectations or the natural course of our activity.

Mike Driscoll04/17/13
925 views
0 replies

The Python Brochure Project

If you’ve been struggling to get Python adopted at your place of work, this brochure might help as it showcases how Python is used in business in various fields from industry and science to education and government.

John Cook04/16/13
2890 views
0 replies

Social Networks in Fact and Fiction

For instance, the social networks of the Iliad and Beowulf look more like actual social networks than does the social network of Harry Potter. Real social networks follow a power law distribution more closely than do social networks in works of fiction.

Paul Miller04/16/13
1914 views
0 replies

Thoughts on the European Data Forum

I travelled to Ireland last week, to attend the second meeting of the European Data Forum (EDF). The EDF provided travel support for my trip, and I am grateful to them for that.

Christopher Taylor04/16/13
1324 views
0 replies

The Quiet Creep of Facial Recognition

Today’s facial recognition software lies at an interesting intersection of three concepts: How long things are stored for, pictures in which a person appears, and people who can recognize faces in a photo (and tag them).

Jonathan Callahan04/16/13
1929 views
0 replies

Using R with Geospatial Data

GIS, an acronym that brings joy to some and strikes fear in the heart of those not interested in buying expensive software. Luckily fight or flight can be saved for another day because you don’t need to be a GIS jock with a wad of cash to work with spatial data and make beautiful plots.

Arthur Charpentier04/15/13
1507 views
0 replies

Reserving with Negative Increments in Triangles

A few months ago, I did published a post on negative values in triangles, and how to deal with them, when using a Poisson regression (the post was published in French). The idea was to use a translation technique...

Doug Turnbull04/15/13
2187 views
0 replies

How to Debug Solr with Eclipse

Recently I was puzzled by some behavior Solr was showing me. I scratched my head and called over a colleague. We couldn’t quite figure out what was going on. Well Solr is open source so… next stop – Debuggersville!