Big Data/Analytics Zone is brought to you in partnership with:
  • submit to reddit
Rob J Hyndman07/24/14
0 replies

Plotting the characteristic roots for ARIMA models

When modelling data with ARIMA models, it is sometimes useful to plot the inverse characteristic roots. The following functions will compute and plot the inverse roots for any fitted ARIMA model (including seasonal models).

Ayende Rahien07/23/14
0 replies

Avoid where in a reduce clause

We got a customer question about a map/reduce index that produced the wrong results. The problem was a problem between the conceptual model and the actual model of how Map/Reduce actually works.

Rob J Hyndman07/22/14
0 replies

I am not an econometrician

Econo­met­rics is often “the­ory dri­ven” while sta­tis­tics tends to be “data dri­ven”. I dis­cov­ered this in the inter­view for my cur­rent job when some­one crit­i­cized my research for being “data dri­ven” and asked me to respond.

Mike Driscoll07/21/14
0 replies

An Intro to Peewee – Another Python ORM

I thought it would be fun to try out a few different Python object relational mappers (ORMs) besides SQLAlchemy. I recently stumbled across a project known as peewee. For this article, we will take the examples from my SQLAlchemy tutorial and port it to peewee to see how it stands up.

Mark Needham07/21/14
0 replies

R: ggplot: Problem automatically picking scale for difftime object

I thought it’d be interesting to create some visualisations around the times that people RSVP ‘yes’ to the various Neo4j events that we run in London. I tried to use ggplot to create a bar chart of the data. Unfortunately that resulted in this error:

Mark Needham07/17/14
0 replies

Thoughts on Software Development R: Apply a Custom Function Across Multiple Lists

In my continued playing around with R I wanted to map a custom function over two lists comparing each item with its corresponding items.

Rob J Hyndman07/17/14
0 replies

Variations on Rolling Forecasts

Rolling forecasts are commonly used to compare time series models. Here are a few of the ways they can be computed using R. I will use ARIMA models as a vehicle of illustration, but the code can easily be adapted to other univariate time series models.

Jason Baldridge07/16/14
0 replies

Emotional Contagion: Contextualizing the Controversy

The past two weeks have seen a great deal of discussion around the recent computational social science study of Kramer, Guillory and Hancock (2014) “Experimental evidence of massive-scale emotional contagion through social networks” .

Doug Turnbull07/16/14
0 replies

Improving The Camel Solr Component

We’ve been using Apache Camel a fair amount recently as our ingestion pipeline of choice. It presents a fairly nice DSL for wiring together different data sources, performing transformations, and finally sending data to Solr.

Allen Coin07/15/14
0 replies

Data Visualization: A Day in the Life of an NYC Taxi

This visualization displays the data for one random NYC yellow taxi on a single day in 2013. See where it operated, how much money it made, and how busy it was over 24 hours.

Doug Turnbull07/15/14
0 replies

Reindexing Collections with Solr’s Cursor Support

When a Solr schema changes, us Solr devs know what’s next — a large reindex of all of our data to capture any changes to index-time analysis. When we deliver solutions to our customers, we frequently need to build this in as a feature.

Yonik Seeley07/15/14
0 replies

Solution for multi-term synonyms in Lucene/Solr using the Auto Phrasing TokenFilter

In this post, I show how the use of this filter combined with a Synonym Filter configured to take advantage of auto phrasing, can help to solve an ongoing problem in Lucene/Solr – how to deal with multi-term synonyms.

Arthur Charpentier07/15/14
0 replies

Bayesian Wizardry for Muggles

Monday, I will be giving the closing talk of the R in Insurance Conference, in London, on Bayesian Computations for Actuaries, as to be more specific, Getting into Bayesian Wizardry… (with the eyes of a muggle actuary).

Whitney Baker07/13/14
0 replies

The Best of the Week (July 4): Big Data Zone

Make sure you didn't miss anything with this list of the Best of the Week in the Big Data Zone (July 4 to July 11). This week's topic's include designing data architecture, python in universities and other links, R tricks and Big Data white papers.

Brian O' Neill07/12/14
0 replies

Applied Big Data: The Freakonomics of Healthcare

Our healthcare system is still (mostly) based on capitalism: more patients + more visits = more money. Within such a system, it is not in the best interest of healthcare providers to have healthy patients.

Angela Ashenden07/11/14
0 replies

Overwhelmed by the Volume of Big Data Information Out There? Step This Way...

If you’re trying to navigate your way through the Big Data landscape and can’t see the wood for the trees, allow us to show you the way by cutting right to the chase with this new series of concise and right-to-the-point reports.

Eyal Golan07/11/14
0 replies

Parse Elasticsearch Results Using Ruby

One of our modules in our project is an elasticsearch cluster. In order to fine tune the configuration (shards, replicas, mapping, etc.) and the queries, we created a JMeter environment.

Arthur Charpentier07/10/14
0 replies

Python's Popularity in Universities and other Big Data Links from Somewhere Else

Some data writings worth reading from Freakonometrics.

Barton George07/10/14
0 replies

Sumo Logic and Machine Data Intelligence — DevOps Days Austin

Today’s interview from DevOps Days Austin features Sumo Logic’s co-founder and CTO, Christian Beedgen. If you’re not familiar with Sumo Logic it’s a log management and analytics service. I caught up with Christian right after he got off stage on day one.

Mark Needham07/10/14
0 replies

R/plyr: ddply – Error in vector(type, length) : vector: cannot make a vector of mode ‘closure’.

In my continued playing around with plyr’s ddply function I was trying to group a data frame by one of its columns and return a count of the number of rows with specific values and ran into a strange (to me) error message.

John Piekos07/09/14
0 replies

Designing a Data Architecture to Support both Fast and Big Data

In this post, I will illustrate how I envision the corporate architecture that will enable companies to achieve the data dream that integrates Fast and Big.

Nati Shalom07/09/14
0 replies

How to do real time complex query on Big Data

In the past few years, almost every function of our lives has become dependent on real time applications. Whether it is updating our friends on every move we make via social media or shopping on e-commerce websites; we have become completely dependent on getting the correct information quickly.

Mark Needham07/08/14
0 replies

Data Science: Mo' Data, Mo' Problems

Over the last couple of years I’ve worked on several proof of concept style Neo4j projects and on a lot of them people have wanted to work with their entire data set which I don’t think makes sense so early on

Whitney Baker07/08/14
0 replies

Leaving Data on the Table: Obstacles to Big Data Analytics

There is frequent conversation about the explosive growth of Big Data in the age of wearables, compulsive social media and ever more capable computers, but when it comes down to gleaning useful insights from data, data scientists face more challenges with variety than with sheer volume.

Mark Needham07/08/14
0 replies

R: Aggregate by different functions and join results into one data frame

In continuing my analysis of the London Neo4j meetup group using R I wanted to see which days of the week we organise meetups and how many people RSVP affirmatively by the day.