Big Data/Analytics Zone is brought to you in partnership with:
  • submit to reddit
Mark Needham08/27/14
2525 views
0 replies

R: Rook - Hello world example - 'Cannot find a suitable app in file'

I’ve been playing around with the Rook library and struggled a bit getting a basic Hello World application up and running so I thought I should document it.

Mikio Braun08/27/14
2429 views
0 replies

Big Data & Machine Learning Convergence

As these two fields converge, work has to be done to provide the right set of mechanisms and abstractions. Right now I still think there is a considerable gap which we need to close over the next few years.

Arthur Charpentier08/27/14
2444 views
0 replies

Computational Actuarial Science, with R

A collection of datasets, originally for the book ‘Computational Actuarial Science with R’ edited by Arthur Charpentier (CAS with R). Now, the package contains a large variety of actuarial datasets.

Alec Noller08/26/14
1771 views
0 replies

Refcard Expansion Pack: Getting Started with Apache Hadoop

This week, DZone released its latest Refcard: Getting Started with Apache Hadoop. If you're interested in learning more about Hadoop or sharpening your skills, we decided to dig into the DZone archives and find some of the most popular posts we've had on the topic.

Mahboob Hussain08/26/14
6593 views
5 replies

Thoughts on Hibernate

The way data are laid out in the columns of tables and the way they are used in the application as the class / instance variables there is. However, this mismatch or "impedance" does not come in the way of software development that it requires a framework that abstracts away all the database-related code.

Saurabh Chhajed08/26/14
2752 views
3 replies

How to Setup Realtime Analytics over Logs with ELK Stack

The ELK stack is ElasticSearch, Logstash and Kibana. These three provide a fully working real-time data analytics tool for getting wonderful information sitting on your data.

Giuseppe Vettigli08/22/14
3961 views
0 replies

Quick HDF5 with Pandas

HDF5 is a format designed to store large numerical arrays of homogenous type. It cames particularly handy when you need to organize your data models in a hierarchical fashion and you also need a fast way to retrieve the data. Pandas implements a quick and intuitive interface for this format and in this post will shortly introduce how it works.

Robert Diana08/21/14
4103 views
0 replies

Geek Reading August 20, 2014

These items are a combination of tech business news, development news and programming tools and techniques.

Gil Allouche08/20/14
11503 views
0 replies

Hadoop 101: An Explanation of the Hadoop Ecosystem

Hadoop is not a single piece of technology. It's composed of an entire ecosystem of tools companies can choose from to create their big data solution

Doug Turnbull08/20/14
2152 views
0 replies

Introducing Splainer: The Open Source Search Sandbox That Tells You Why

This is the entire art and science of search relevancy. It's not magic gnomes inside a box that understand all about baby bottles. No, it's heavily tuned heuristics that Solr and Elasticsearch use out of the box.

Mark Needham08/19/14
3203 views
0 replies

Where does R Studio install packages/libraries?

As a newbie to R I wanted to look at the source code of some of the libraries/packages that I’d installed via R Studio which I initially struggled to do as I wasn’t sure where the packages had been installed.

Robert Diana08/19/14
5101 views
0 replies

Geek Reading August 18, 2014

These items are a combination of tech business news, development news and programming tools and techniques.

Mike Bushong08/18/14
4136 views
0 replies

Graph Theory and Calculating Network Topologies

Any network can be represented as a graph. The switches in the network are the vertices or nodes in the graph, the links between them the edges.

John Cook08/15/14
9271 views
0 replies

What would Donald Knuth do?

I’ve seen exhortations to think like Leonardo da Vinci or Albert Einstein, but these leave me cold. I can’t imagine thinking like either of these men. But here are a few famous computer scientists I could imagine emulating when trying to solve a problem. What would you add to the list?

Angela Ashenden08/15/14
397 views
0 replies

Big Data gets wire data as ExtraHop shares its special source

Though these wider applications of IT Operations Analytics are starting to be mentioned in case study quotes, it’s still not yet at the forefront of ExtraHop’s own publicity.

Mark Needham08/14/14
4248 views
0 replies

R: Grouping by two variables

In my continued playing around with R and meetup data I wanted to group a data table by two variables – day and event – so I could see the most popular day of the week for meetups and which events we’d held on those days.

Trevor Parsons08/14/14
3033 views
0 replies

How D3 can help you build effective data visualizations

This post will show the value of D3 in doing this manipulation during resizing of different elements and that there are other steps we can take to be clever in designing and developing graphs.

Arnon Rotem-gal-oz08/13/14
2695 views
0 replies

Introduction to Big Data - Presentation

I presented Big Data to Amdocs’ product group last week. One of the sessions I did was recorded so I might be able to add here later. Meanwhile you can check out the slides.

Steven Lott08/12/14
4451 views
0 replies

Some Basic Statistics

I've always been fascinated by the essential statistical algorithms. First, some basics. Once we have these, though, the definitions of mean and standard deviation become simple and kind of cool.

Mark Needham08/11/14
2456 views
0 replies

R: Aggregate by different functions and join results into one data frame

In continuing my analysis of the London Neo4j meetup group using R I wanted to see which days of the week we organise meetups and how many people RSVP affirmatively by the day.

Michael Mccandless08/07/14
3828 views
0 replies

A new proximity query for Lucene, using automatons

As of Lucene 4.10 there will be a new proximity query to further generalize on MultiPhraseQuery and the span queries: it allows you to directly build an arbitrary automaton expressing how the terms must occur in sequence, including any transitions to handle slop.

Trevor Parsons08/05/14
6460 views
0 replies

How to combine D3 with AngularJS

As we all know, Angular and D3 frameworks are very popular, and once they work together they can be very powerful and helpful when creating dashboards.

Lijin Joseji08/05/14
3868 views
0 replies

6 sparkling features of Apache Spark!

What is Apache Spark? Why there is a serious buzz going-on about this? If you are in the Big Data analytics business, should you really care about Spark? I hope this post will help to answer some of these questions which might have coming to your mind these days.

Jakub Holý08/04/14
2711 views
0 replies

Most interesting links of July '14

A curated collection of the most interesting articles, links, and news in the programming world from last month, July of 2014.

Robert Diana08/01/14
5265 views
0 replies

Geek Reading July 31, 2014

These items are a combination of tech business news, development news and programming tools and techniques.