Amazon Creates a Machine Learning Service

Amazon announced yesterday that it has entered the machine learning market with a full suite of tools to allow companies to make better predictions. A few aspects of the new service, called Amazon Machine Learning, raise questions, including the fact that models cannot be exported from Amazon servers, so a payment will be required every time a model is created, modified, or applied to a new dataset.

Prices are reported to be low in comparison to offerings from IBM, Microsoft and Google. Of course, for less complex problems, open-source machine-learning algorithms are already available for free for those with access to a computer and a web connection.

Will the predictions be more accurate than those generated by open-source tools? Possibly. The first factor is the accuracy and cleanliness of the data used to train the model, and that is outside of Amazon’s control, although it provides tools for identifying and reducing noise. The second factor is the size of the training dataset, and here Amazon’s infrastructure is a clear advantage, assuming the model is of sufficient complexity. Amazon states that the maximum file size for reading into the model is 100GB.

Speed and cost become limiting factors as model complexity increases. Open-source tools can be run on Amazon EC2 server instances without turning over ownership of the model itself to Amazon, and I have yet to see a cost comparison between an EC2 instance and using the machine learning service. Such a comparison, however, would certainly include a healthy reduction in setup time owing to the fact that Amazon controls the hardware and software, which eliminates the problem of resource allocation. Amazon’s speed advantage rises when the input data, such as web traffic, are already hosted on its servers, reducing the number of steps to aggregate and load data into a model.

In any case, Amazon’s entry into the market is guaranteed to spur interest and innovation in the field. Microsoft Excel, for example, continues to make advanced functions more accessible with each release, and at present, machine learning functionality is limited.