Data Tech 2019

MinneAnalytics‘ Data Tech 2019 conference was held in Bloomington, MN on May 30 at Normandale Community College. The 16,000+ member local meetup group hosts free conferences for big data, data science and analytics, drawing participants from smaller meetups, such as the Twin Cities Python User Group (around 2,800 members).

From my previous experience presenting at MinneAnalytics’ finance and retail analytics event (FARCON), I knew the free event would fill up within an hour or two after the registration window opened. As soon as I received the announcement, sent to Python meetup group members, I registered on Eventbrite and gave at glance at the event schedule hosted by sched.com.

The sched.com platform had good ergonomics, with detailed session descriptions and bios one click below the color-coded titles in the overview landing page. A second click added a session to my personal schedule in advance. However, the schedule was saved as a session cookie in my desktop browser. If I wanted to access the schedule from the server later using my phone, I should have created a sched.com login before choosing any sessions. Otherwise, no complaints.

With sessions being held in conference rooms and auditoriums flung across campus, there can be a trade-off between an interesting session located far away, and remaining in one room, leaving more time to network between sessions. The vendor area was full, I’d say 30 vendors, but I didn’t spend any time there, as the central hub area was always crowded between sessions.

It was difficult to avoid sessions that used the word “graph” in the title. There is now a machine learning niche specializing in node/edge models, used for things like proximity analysis and relationship strength, such as search engines and product recommendation algorithms. Graph databases allow very large datasets to be loaded into memory (using an Intel chip with 8GB of Octane memory per CPU core) and graph optimization algorithms allow for fast location of recommended products as well as parallel processing, which apparently is extremely difficult with graph data architecture. Uber uses graphs for distance calculations; Amazon for product recommendations. According to Gartner, graph jumped to fifth place in the list of hottest data science topics.

In terms of hardware, Henry Gabb outlined Intel’s effort to develop a specialized chip to make graph traversal faster and less power intensive. On the database side, Sundeep Vishnumurthy talked about some current options: Neo4j, TigerGraph, AnzoGraph, and DataStax. Dan McCreary proposed that “knowledge engineering” replace “data science”: reusable, explainable model-based machine-learning algorithms applied to graph data will change how data scientists spend their time.

Thanks to all the presenters and to MinneAnalytics for putting together an excellent conference. More information on graphs can be found here and here.

Leave a Reply

Your email address will not be published. Required fields are marked *