Scalable Machine Learning Using Python and a Distributed Analytical Database

Python is a leading programming language for machine learning today due to its flexibility, portability, and libraries. Another major benefit it provides to data scientists is its ability to work well with other analytics tools and frameworks. However, Python has issues around scalability that can make getting machine learning models into production a challenge. Many machine learning projects stall when trying to make the leap to high-scale production.

Financial institutions have huge amounts of structured data which usually resides in distributed data stores. Instead of using Python to extract sample data from those distributed data stores for building machine learning models, Vertica offers the capability to execute Python computations inside the database where the full dataset resides. This both simplifies model training and boosts accuracy by removing the need to downsample. It also greatly speeds model deployment into full-scale production. You can get proven models deployed in minutes, not months.

In this session, we will demonstrate a credit card fraud detection example of how Python can be combined with a distributed analytical database, Vertica, to parallelize and simplify your machine learning model training and deployment.


Presented by Badr Ouali, Data Scientist at Vertica



Sign up for this webinar