Are you the publisher? Claim or contact us about this channel


Embed this content in your HTML

Search

Report adult content:

click to rate:

Account: (login)

More Channels


Channel Catalog


Channel Description:

Machine learning. Deep learning.
    0 0

    Updated May 22, 2014 0xdata (“Hexa-data”) is a small group of smart people from Stanford and Silicon Valley with VC backing and an open source software project for advanced analytics (H2O).  Founded in 2011, 0xdata first appeared on analyst dashboards in 2012 and has steadily built a presence in the data science community since then. […]

    0xdata_Explainerthomaswdinsmore0xdata_Explainerthomaswdinsmore

    0 0

    Updated and bumped July 10, 2014. For a powerpoint version on Slideshare, go here. Introduction Apache Spark is an open source distributed computing framework for advanced analytics in Hadoop.  Originally developed as a research project at UC Berkeley’s AMPLab, the project achieved incubator status in Apache in June 2013 and top-level status in February 2014.  According to one analyst, Apache […]

    Spark LogothomaswdinsmoreSpark Logothomaswdinsmore

    0 0
  • 01/03/14--05:56: Analytic Startups: Skytree
  • Skytree started out as an academic machine learning project developed at Georgia Tech’s Fastlab.  Leadership shopped the software to a number of software vendors prior to 2011 and, finding no buyers, launched as a standalone venture in 2012. In April 2013, Skytree announced Series A funding of $18 million, with backing from U.S. Venture Partners, […]

    thomaswdinsmoreurlthomaswdinsmoreurl

    0 0

    Much has changed since I last blogged on this subject a year ago (here and here).  This is the first of a three-part blog covering the current state of play for machine learning in Hadoop.  I use the term “machine learning” deliberately, to refer to tools that can learn from data in an automated or […]

    machine-learningthomaswdinsmoremachine-learningthomaswdinsmore

    0 0

    This is the second of a three-part series on the current state of play for machine learning in Hadoop.  Part One is here.  In this post, we cover open source options. As we noted in Part One, machine learning is one of several technologies for analytics; the broader category also includes fast queries, streaming analytics […]

    machine-learningthomaswdinsmoremachine-learningthomaswdinsmore

    0 0

    A colleague asks: can we automate predictive modeling? How we answer the question depends on the context.   Consider the two variations on the question below, with more precise wording: Can we completely eliminate the need for expertise in predictive modeling — so that an “ordinary business user” can do it? Can we make expert […]

    AutomationthomaswdinsmoreAutomationthomaswdinsmore

    0 0
  • 04/22/14--08:34: Analytic User Personas
  • Analytic users are not all the same; in most organizations, there are a number of different user “personalities”, or personas, with distinct needs.  If you develop an analytics architecture for your organization or develop analytic software to sell to others, it is important to understand these personas.  In this essay, I profile four personas: Power […]

    user personasthomaswdinsmoreGoogle Trends Data Scientistuser personasthomaswdinsmoreGoogle Trends Data Scientist

    0 0
  • 04/23/14--08:26: Python for Analytics
  • A reader complains that I did not include Python in a survey of Machine Learning in Hadoop.  It’s a fair point.  There was a lively debate last year between R and Python advocates, variously described as a war or a boxing match.  Matt Asay argued that Python is displacing R; Sharon Machlis and David Smith countered.  In […]

    pythonthomaswdinsmoreStrata Tool CorrelationpythonthomaswdinsmoreStrata Tool Correlation

    0 0

    Can we leverage distributed computing for machine learning and predictive analytics? The question keeps surfacing in different contexts, so I thought I’d take a few minutes to write an overview of the topic. The question is important for four reasons: Source data for analytics frequently resides in distributed data platforms, such as MPP appliances or […]

    grid-computing-bluethomaswdinsmoregrid-computing-bluethomaswdinsmore

    0 0

    There are formal methods and tools you can use to optimize marketing spend, including software from SAS, IBM and HP (among others).  The usefulness of these methods, however, depends on basic disciplines that are missing from many Marketing organizations. In this post I’d like to propose some informal rules for marketing optimization.  These do not exclude using […]

    Digital Marketing Program Spend 2009 – 2016thomaswdinsmoreDigital Marketing Program Spend 2009 – 2016thomaswdinsmore

    0 0
  • 11/04/14--05:32: SAS in Hadoop: An Update
  • SAS supports several different products that run “inside” Hadoop based on two different in-memory architectures: (1) The SAS High Performance Analytics suite, originally designed to run in dedicated Teradata and Greenplum appliances, includes five modules: Statistics, Data Mining, Text Mining, Econometrics and Optimization. (2) A second set of products — SAS Visual Analytics, SAS Visual Statistics and […]

    thomaswdinsmorethomaswdinsmore

    0 0
  • 12/01/14--06:00: SAS Versus R (Part 1)
  • Which is better for analytics, SAS or R?  One frequently sees discussions on this topic in social media; for examples, see here, here, here, here, here and here.   Like many debates in social media, the degree of conviction is often inverse to the quantity of information, and these discussions often produce more heat than light. […]

    thomaswdinsmoreCRAN DownloadsthomaswdinsmoreCRAN Downloads

    0 0
  • 12/15/14--06:00: SAS Versus R Part Two
  • In a previous post, I summarized some myths about SAS and R — arguments offered by proponents of one or the other that deserve to be dismissed. In this post, I will review some arguments that do make sense — things to consider if you are an aspiring analyst or if you are an executive […]

    thomaswdinsmorethomaswdinsmore

    0 0

    Strata+Hadoop World week is a good opportunity to update the list of platforms for high-performance advanced analytics.  Vendors are hustling this week to announce their latest enhancements; I’ll post updates as needed. First some definition.  The scope of this analysis includes software with the following properties: Support for supervised and unsupervised machine learning Support for distributed […]

    thomaswdinsmorethomaswdinsmore

    0 0

    Stories about SAS Visual Analytics are among the most widely read posts on this blog.  In the last two years I’ve received many queries from readers who complain that it’s hard to get clear answers about the software from SAS. In software procurement, the customer has bargaining power until the deal closes; after that, power […]

    lady-pushing-a-shopping-cart-in-the-supermarket-2thomaswdinsmoreSASVA vs TableauHP4VAlady-pushing-a-shopping-cart-in-the-supermarket-2thomaswdinsmoreSASVA vs TableauHP4VA

    0 0
  • 06/12/15--14:12: Spark 1.4 Released
  • On June 11, the Spark team announced availability of Release 1.4.  More than 210 contributors from 70 different organizations contributed more than 1,000 patches.  Spark continues to expand its contributor base, the best measure of health for an open source project. Spark Core The Spark team continues to improve Spark operability, performance and compatibility.  Key enhancements […]

    maxresdefaultthomaswdinsmoreScreen Shot 2015-06-12 at 2.00.20 PMmaxresdefaultthomaswdinsmoreScreen Shot 2015-06-12 at 2.00.20 PM

    0 0

    O’Reilly releases its 2015 Data Science Salary Survey.  The report, authored by John King and Roger Magoulas summarizes results from an ongoing web survey.  The 2015 survey includes responses from “over 600” participants, down from the “over 800” tabulated in 2014. The authors note that the survey includes self-selected respondents from the O’Reilly audience and […]

    data-scientistthomaswdinsmoreScreen Shot 2015-10-05 at 11.19.29 AMdata-scientistthomaswdinsmoreScreen Shot 2015-10-05 at 11.19.29 AM

    0 0

    A group of scientists affiliated with IBM and several universities report on a detailed analysis of MapReduce and Spark performance across four different workloads.  In this benchmark, Spark outperformed MapReduce on Word Count, k-Means and Page Rank, while MapReduce outperformed Spark on Sort. On the ADT Dev Watch blog Dave Ramel summarizes the paper, arguing that it “brings into question..Databricks Daytona GraySort […]

    talkshowmanparisvegas3asseoscaledthomaswdinsmoretalkshowmanparisvegas3asseoscaledthomaswdinsmore

    0 0
  • 02/13/16--13:47: IBM and Spark (Updated)
  • Updated March 8, 2016.  After publishing this post, I met with several IBM executives at Spark Summit East, who confirmed the accuracy of the original post and provided additional detail, which I’ve included in this version.  Updates are in bold red italics. IBM also provided the low-resolution image. IBM has a good story to tell — […]

    IBM SparkthomaswdinsmoreScreen Shot 2016-02-13 at 2.48.21 PMScreen Shot 2016-02-13 at 3.11.35 PMIBM SparkthomaswdinsmoreScreen Shot 2016-02-13 at 2.48.21 PMScreen Shot 2016-02-13 at 3.11.35 PM

    0 0
  • 01/31/17--21:32: The Year in SQL Engines
  • As an addendum to my year-end review of machine learning and deep learning, I offer this survey of SQL engines. SQL is the most widely used language for data science according to O’Reilly’s 2016 Data Science Salary Survey. Most projects require at least some SQL operations, and many need nothing but SQL. This review covers six open source leaders: Hive, Impala,

    hivethomaswdinsmorescreen-shot-2017-01-31-at-1-04-43-pmscreen-shot-2017-01-31-at-2-27-15-pmscreen-shot-2017-01-31-at-2-52-27-pmhivethomaswdinsmorescreen-shot-2017-01-31-at-1-04-43-pmscreen-shot-2017-01-31-at-2-27-15-pmscreen-shot-2017-01-31-at-2-52-27-pm