Big Data/BI Zone is brought to you in partnership with:
  • submit to reddit
Doug Turnbull04/18/13
2408 views
0 replies

Querying More Fields != More Results

Let’s take our new knowledge for a test drive with this puzzler: Why would adding a field to qf cause our result set to actually shrink in size?

Mike Driscoll04/18/13
3479 views
0 replies

Python Gets Funded by DARPA for Big Data Project: Blaze

I first heard about Blaze from NumPy’s original developer’s blog back in December 2012. Recently InformationWeek announced that DARPA was funding the project to the tune of $3 million dollars.

Christopher Taylor04/18/13
1388 views
0 replies

Structure No Longer Has a Say on What's Data or Not

Not so long ago, businesses didn’t care about information outside the normal structure of trusted outlets like print media, trade journals, academic research and other trusted system-generated information.

Swathi Venkatachala04/18/13
386 views
0 replies

Monitoring S3 Uploads for a Real Time Data

If you are working on Big Data and its bleeding edge technologies like Hadoop etc., the primary thing you need is a "dataset" to work on.

Eric Gregory04/17/13
6237 views
0 replies

Dev of the Week: A. Jesse Jiryu Davis

This week we're talking to A. Jesse Jiryu Davis, a developer at 10gen specializing in MongoDB, Python, Tornado, and Javascript.

Damaris Coll04/17/13
3327 views
0 replies

Detecting Social Capitalists on Twitter with Graph Databases

Nicolas Dugué and Anthony Perez from the University of Orleans introduce new techniques to detect social capitalists on Twitter using Graph Databases.

Tony Russell-rose04/17/13
2139 views
0 replies

Search that Sucks

It isn’t until we encounter flawed design that we are jolted out of our flow and forced to make choices that don’t seem to fit with either our expectations or the natural course of our activity.

Mike Driscoll04/17/13
935 views
0 replies

The Python Brochure Project

If you’ve been struggling to get Python adopted at your place of work, this brochure might help as it showcases how Python is used in business in various fields from industry and science to education and government.

John Cook04/16/13
2901 views
0 replies

Social Networks in Fact and Fiction

For instance, the social networks of the Iliad and Beowulf look more like actual social networks than does the social network of Harry Potter. Real social networks follow a power law distribution more closely than do social networks in works of fiction.

Paul Miller04/16/13
1927 views
0 replies

Thoughts on the European Data Forum

I travelled to Ireland last week, to attend the second meeting of the European Data Forum (EDF). The EDF provided travel support for my trip, and I am grateful to them for that.

Christopher Taylor04/16/13
1330 views
0 replies

The Quiet Creep of Facial Recognition

Today’s facial recognition software lies at an interesting intersection of three concepts: How long things are stored for, pictures in which a person appears, and people who can recognize faces in a photo (and tag them).

Jonathan Callahan04/16/13
1945 views
0 replies

Using R with Geospatial Data

GIS, an acronym that brings joy to some and strikes fear in the heart of those not interested in buying expensive software. Luckily fight or flight can be saved for another day because you don’t need to be a GIS jock with a wad of cash to work with spatial data and make beautiful plots.

Arthur Charpentier04/15/13
1512 views
0 replies

Reserving with Negative Increments in Triangles

A few months ago, I did published a post on negative values in triangles, and how to deal with them, when using a Poisson regression (the post was published in French). The idea was to use a translation technique...

Doug Turnbull04/15/13
2199 views
0 replies

How to Debug Solr with Eclipse

Recently I was puzzled by some behavior Solr was showing me. I scratched my head and called over a colleague. We couldn’t quite figure out what was going on. Well Solr is open source so… next stop – Debuggersville!

Daniel Bartl04/15/13
1692 views
0 replies

Cassandra 1.1 – Tuning for Frequent Column Updates

Cassandra is known for its good write performance. But there are scenarios, when you might run into trouble – especially when particular use case generates heavy disk IO. This could be the case for columns which receive frequent updates. However, you can avoid those problems, with proper configuration, or just by updating to recent Cassandra version.

Jessica Thornsby04/15/13
344 views
0 replies

WANdisco Releases New Version of Hadoop Distro

WDD is a fully tested, production-ready version of Apache Hadoop 2 that’s free to download. WDD version 3.1.1 includes an enhanced, more intuitive user interface that simplifies Hadoop cluster deployment. WDD 3.1.1 supports SUSE Linux Enterprise Server 11 (Service Pack 2), in addition to RedHat and CentOS.

Kay Cichini04/14/13
2575 views
0 replies

Download Files from Google Drive/Docs Programmatically with R

Following up my last post on how to download files from the cloud with R...

Kay Cichini04/13/13
1834 views
0 replies

Tweaking Movie Subtitles with R

I use R to fix subtitles that are not in sync with my movies. For the example below the subs were showing too early - so I added some time to each sequence in the srt file. For simplicity I used exactly 1 second in the below example.

Giuseppe Vettigli04/12/13
1864 views
0 replies

Odd-Even Sort Visualized

The Odd/Even sort is a sorting algorithm which uses the concept of the Bubble Sort to move elements around. Unlike Bubble sort, the Odd/Even sort compares disjointed pairs by using alternating odd and even index values splitting the sorting in different phases.

John Berryman04/12/13
1982 views
0 replies

Modifying Solr Result Relevancy Via An “Auxiliary Boost” Field

English is a confusing language. I mean, does it really make sense that you can park in a driveway or drive in a parkway? Also, I’ve always been amused that there actually exists a class of words that are their own antonym – so called “auto-antonyms.”

Arthur Charpentier04/12/13
943 views
0 replies

Data Roundup: The Hidden Biases of Big Data and More

The hidden biases in Big Data, a PlayDoh 3D printer, and much much more in Arthur Charpentier's data links roundup.

Sarfraz Khan04/12/13
1813 views
0 replies

Using MVC4 WebAPI for CRUD Operations on MongoDB

MVC 4 Web API project supporting CRUD operations on MongoDB.

Kay Cichini04/11/13
2200 views
0 replies

Download Files from Dropbox Programmatically with R

Here is a usefull snippet that I stole from qdap::url_dl to download files from my Dropbox to the working directory.

Arthur Charpentier04/11/13
2329 views
0 replies

Diary of an Addict

After four days offline, I have to face the truth: I am a computer addict. Here is a diary of the last four days, ostensibly without touching my computer, at work and at home.

Eric Gregory04/11/13
644 views
0 replies

Google on Open Source

Chris Dibona, open source manager at Google, talks about licences and patents, open source at Google, and more: