The billion dollar question facing executives everywhere: How do I monetize my data? What small data or big data monetization strategies should I adopt? Which analytical investments and strategies really increase revenue? What pilots should I run to test data monetization ideas out?
Get started with PostgreSQL in about 30 pages.
You know how I keep banging on about attracting different types of people into programming? You know how we say we need to get them young?
A/B testing gets a lot of attention on Hacker News, inbound.org, and other forums, and appeals to me as a data analysis exercise. As a software engineer with a practical bent, I like the concept of data analysis techniques which produce useful results while treating a system as a black box.
It's important to address language or platform incompatibility as consequences of technology modernization. The reason why we have to do manual conversions of software is because of the language incompatibility issue. We must convert manually when no tool can do the conversion.
Quite often I have long procedures running and want to do this over night. However, my computer would still be running all night after the script has finished.
In this really excellent talk from Strata 2013, Twitter's Nathan Marz walks through the challenges and serious rewards of building systems that are resilient even in the face of human error...
When you expand (x + y)n, the coefficients increase then decrease. The largest coefficient is in the middle if n is even; it’s the two in the middle if n is odd. For example, the coefficients for (1 +x)4 are 1, 4, 6, 4, 1 and the coefficients for (1 + x)5 are 1, 5, 10, 10, 5, 1.
Brian O'Neill gives us a dive into the "Big Data Quadfecta" from Philly JUG.
I'm using a 64 bit Linux, and I was having a lot of trouble getting Calibre to convert books into the Kindle-loving mobi format...
You may have seen the joke “Enter any 12-digit prime number to continue.” I’ve seen it floating around as the punchline in several contexts.
Suppose you have a very large dataset - far too large to hold in memory - with duplicate entries. You want to know how many duplicate entries, but your data isn't sorted, and it's big enough that sorting and counting is impractical. How do you estimate how many unique entries the dataset contains?
With the release of Solr 4.2 we’ve got the possibility to use the HTTP protocol to get information about Solr index structure. Let's look at the new API by example.
It’s one thing to know abstractly that the Silicon Valley is home to most computer related companies, and to drive down Highway 101 and see another well known company every 30 seconds or so.
Data scientist Monica Rogati discusses data scaling at LinkedIn and reflects on the evolving role of the data scientist.
A new feature of Lucene 4 – pluggable codecs – allows for the modification of Lucene’s underlying storage engine. Working with codecs and examining their output yields fascinating insights into how exactly Lucene’s search works in its most fundamental form.
As CBS showed, Intel is already selling software to recognize the rough demographic of an individual in order to deliver more a targeted advertisement, and as the report stated, “Big Brother is no longer Big Government. Big Brother is Big Business,” but without the rules that restrict government.
Arthur Charpentier's regular data link roundup explores quantified consensus on anthropogenic global warming, compares SAS and R for business analysts, and much more. Plus: zombies (with R).
Now, I have to confess that I have been surprised, while I was looking for mathematical models for shuffling, to find so many deterministic techniques (and results related to algebra, and cycles).
Given a list of dates, how would you rewrite them so that two (or more) consecutive dates are displayed together?
This is the first blog post in a series which looks at some data organization patterns in MapReduce. We’ll look at how to bucket output across multiple files in a single task, how to multiplex data across multiple files, and also how to coalesce data. These are all common patterns that are useful to have in your MapReduce toolkit.
"When we’re faced with a 'prove or disprove,' we’re usually better off trying first to disprove with a counterexample, for two reasons..."
Recently, I read this post about Richard Stallman’s (RMS) visit to India. And read more about RMS’s visit impacted the country.
We need a way to match queries to entities in our Postgres database. At first, this might seem like a simple problem with a simple solution, especially if you’re using the ORM; just jam the user input into an ORM filter and retrieve every matching string. But there’s a problem.
Lisp practically has no syntax. It simply has parenthesized expressions. This makes it very easy to start using the language. And above all, it makes it easy to treat code as data. Lisp macros are very powerful, and these macros are made possible by the fact that the language is simple to parse.