PAPIs Latam 2019: Full Schedule

11:00 GMT-03

Bias and bugs: implementing recommendations

After modeling and validating both with automatic and internal manual tests, we deploy a recommendation system. But did we randomize properly the testing groups? Adding a feature to one page, but not another, will already imply in selection bias. In this presentation we will quickly present a neural network based recommender, but focus on the issues faced when (a) defining our randomized trial test (b) selecting the test and control groups (c) coping with our inherent web based selection bias and (d) defining which statistical test to apply and (e) how to interpret its results.

Speakers

Guilherme Silveira

Head of Education, Alura

Guilherme co-founded Caelum and Alura, the largest brazilian online training platform in data science, machine learning and software development.With over 15 years of experience in software development education he is responsible for innovation, content quality and training at Alura... Read More →

Tuesday June 25, 2019 11:00 - 11:20 GMT-03
Room 9 Av. Rebouças, 3970 - Pinheiros, São Paulo - SP, 05402-600, Brazil

Engineering

11:30 GMT-03

TensorFlow image inferencing: an adventure in Python and Go

We tried to use the TensorFlow SDK in our existing Go applications, a natural fit for inference on our deep learning models. However, the Go API is not as well maintained as the Python APIs and we hit problems, extracting our inferencing into a standalone Python project.

In Python, it was easy to validate our results but we faced new challenges: do we make our communication synchronous or asynchronous? Do we use HTTP? REST? gRPC? Message queues? In this presentation we tackle several versions of this project and how we reached a stable architecture supporting many deep learning models.

Speakers

Vitor De Mario

Tech Lead, NeuralMed

Former organizer of GopherCon Brasil and the Go meetups in São Paulo. Speaker since 2015, including TDC São Paulo and Porto Alegre, GopherCon Brasil 2016, a lightning talk on GopherCon Denver 2017, THECONF and 7Masters. Tech lead of a data science company working on deep learning... Read More →

Tuesday June 25, 2019 11:30 - 11:50 GMT-03
Room 9 Av. Rebouças, 3970 - Pinheiros, São Paulo - SP, 05402-600, Brazil

Engineering

12:00 GMT-03

Deploy your Deep Learning models in serverless architectures

You received the task of building a Deep Learning model for detecting Malaria in human cell pictures. After this, you:
Prepared the datasets;
Chose the best architecture for the task;
Set up the training environment;
Trained the model;
Tested the model.
What now? Where should you deploy it? Will you need a GPU for inferences? What about auto-scaling? Should you build an API for it?
This presentation will show you what a Serverless Architecture is and how easily you can deploy your Deep Learning models in it, at a very low cost.

Speakers

Adriano Dennanni

Machine Learning Engineer, neuronio.ai

Machine Learning Engineer @ neuronio.aiGraduated in Computer Engineering @ POLI-USP/Brazil

Tuesday June 25, 2019 12:00 - 12:20 GMT-03
Room 9 Av. Rebouças, 3970 - Pinheiros, São Paulo - SP, 05402-600, Brazil

Engineering

14:00 GMT-03

K8s-workqueue: Simplified Kubernetes ML Batch Jobs

Managing batch ML jobs is a central competency for Data Science (DS) teams in the ad tech space. According to PWC research, digital ad spend has increased by 23% to $50 Billion in the first half of 2018. To deal with this growth, DS teams need flexible tools.
We present our k8s-workqueue system. A pluggable scheduling mechanism for ML Kubernetes workloads where tens of thousands of models are built every day on our platform. The focus on simplicity, led us to the design of this system that combines familiar features of traditional cron jobs and containers, with the power of the Kubernetes API.

Speakers

Chinmay Nerurkar

Senior Software Engineer II, Team Lead, Xandr Inc.

Tuesday June 25, 2019 14:00 - 14:20 GMT-03
Room 9 Av. Rebouças, 3970 - Pinheiros, São Paulo - SP, 05402-600, Brazil

Engineering

14:30 GMT-03

Airflow on Kubernetes: a modern approach to ETL workflows.

ETL workflows are no different from standard software. They should be implemented as code, with automated tests and continuous delivery. It should also be easy to understand, scale, debug, modify and monitor. Apache Airflow provides a framework for designing workflows as Python scripts, along with centralized logs, tasks status, metrics and a graph view. All these great features come at the price of a steep learning curve and a nasty mix of orchestration bugs and task bugs due to the variety of operators. I'll show how to use only one operator for any ETL workflow and solve that problem.

Speakers

Raphael Sampaio

Engineer, Konduto

Engineer at Konduto, a Brazilian company using Machine Learning for fraud detection. Our algorithm combines geographical, social and behavioral features to deliver an accurate risk measure, increasing customers profit margins while keeping fraud rates under control.

Tuesday June 25, 2019 14:30 - 14:50 GMT-03
Room 9 Av. Rebouças, 3970 - Pinheiros, São Paulo - SP, 05402-600, Brazil

Engineering

11:00 GMT-03

Fklearn: A functional library for machine learning

Fklearn is a production-ready functional library for machine learning. Fklearn is already being used to power millions of predictions with significant impact on the business bottomline. While it is written in Python, it follows the best practices of functional programming, offering side-effect free machine learning pipelines. Fklearn is Pandas dataframe first, with a pragmatic choice of models. It provides advanced encoders, transformations, and realistic evaluation methods. Fklearn has continuous integration including unit tests with high coverage, linting, and static type checking.

Speakers

Henrique Lopes

Machine Learning Engineer, Nubank

Data scientist at Nubank, working with risk models in the credit lines squad. PhD student the Unicamp, working with Bayesian deep learning and uncertainty quantification. Master's degree on adversarial images and variational autoencoders. Previously, worked as a senior data scientist... Read More →

Wednesday June 26, 2019 11:00 - 11:30 GMT-03
Room 9 Av. Rebouças, 3970 - Pinheiros, São Paulo - SP, 05402-600, Brazil

Tutorial

11:40 GMT-03

Reproducibility with Data Version Control

Data Version Control (or DVC) is a tool that complements git allowing data files to be versioned and shared easily, as well as enforcing best practices required for reproducible experiments in data science projects. Scientific methods aside, it also makes experimentation not only safer but faster. More details at https://dvc.org/

Speakers

Victor Villas Bôas Chaves

Data Engineer, Gupy

Currently a Data Engineer at Gupy, the leading ATS in Brazil. Also committer of open source tools for data science and data engineering like pandas and apache airflow.

Wednesday June 26, 2019 11:40 - 12:10 GMT-03
Room 9 Av. Rebouças, 3970 - Pinheiros, São Paulo - SP, 05402-600, Brazil

Tutorial

14:00 GMT-03

Validating models in the real world

Validation in the real world goes way further than a random split or the K-fold. When the validation performance doesn't match the production one, there are many possible causes and a good one is that the validation was done wrongly.

On this talk, a general idea of what validation means is framed. It includes some specific cases and how to design a validation schema for them. In the end, it's expected that the audience is able to identify everything that is important when validating and how to come up with the right strategy to validate any new model they face in the Wild.

Speakers

Luis Moneda

Data Scientist, Nubank

Data Scientist at Nubank. Bachelor in economics (FEA-USP) and computer engineering (Poli-USP), MSc in Computer Science student (IME-USP). Interested in machine learning and causal inference.

Wednesday June 26, 2019 14:00 - 14:30 GMT-03
Room 9 Av. Rebouças, 3970 - Pinheiros, São Paulo - SP, 05402-600, Brazil

Tutorial

14:40 GMT-03

ETL Orchestration with AWS Glue and AWS Step-functions

A demonstration showing how the Data Team at Stoodi speeded up ETL changes implementing AWS step Functions and AWS Lambda on top of AWS Glue to organize Data Pipeline as a state-machine.

On this demo we want to show how we changed a pipeline with just AWS Glue to one with two more Amazon products to ease the Pipeline modifications and improve data consistency and availability to our teams, making them more data driven. We used an AWS Lambda as main orchestration and Step-Functions as state-machine service.

Speakers

Alexsandro Francisco dos Santos

Data Engineer, Stoodi

Alexsandro Francisco is a Computer Scientist student at Federal University of ABC, Python programmer, data & A.I. enthusiast and robot maker. Already worked with Data Science, Web Scrapping and Software Development at a series of startups, learning how data can change business. Currently... Read More →

Wednesday June 26, 2019 14:40 - 15:10 GMT-03
Room 9 Av. Rebouças, 3970 - Pinheiros, São Paulo - SP, 05402-600, Brazil

Tutorial

PAPIs Latam 2019

11:00 GMT-03

Guilherme Silveira

11:30 GMT-03

Vitor De Mario

12:00 GMT-03

Adriano Dennanni

14:00 GMT-03

Chinmay Nerurkar

14:30 GMT-03

Raphael Sampaio

11:00 GMT-03

Henrique Lopes

11:40 GMT-03

Victor Villas Bôas Chaves

14:00 GMT-03

Luis Moneda

14:40 GMT-03

Alexsandro Francisco dos Santos

Recently Active Attendees

Twitter Feed