Elephants, ibises and a more Pythonic way to work with databases
- Room:
- Liffey Hall 1
- Start (Dublin time):
- Start (your time):
- Duration:
- 45 minutes
Abstract
In this talk, I will be sharing about Ibis, a software package that provides a more Pythonic way of interacting with multiple database engines. In my own adventures living in Zimbabwe, Iâve always encountered ibises (the bird versions) perched on top of elephants. If youâve never seen an elephant in real life I can confirm that they are huge, complex creatures. The image of a small bird sitting on top of a large elephant serves as a metaphor for how ibis (the package) provides a less complex, more performant way for Pythonistas to interact with multiple big data engines.
I'll use the metaphor of elephants and ibises to show how this package can make a data workflow more Pythonic. The Zen of Python lets us know that simple is better than complex. The bigger and more complex your data, the more of an argument there is to use Ibis. Raw SQL can be quite difficult to maintain when your queries are very complex. For Python programmers, Ibis offers a way to write SQL in Python that allows for unit-testing, composability, and abstraction over specific query engines (e.g.BigQuery)! You can carry out joins, filters, and other operations on your data in a familiar, Pandas-like syntax. Overall, using Ibis simplifies your workflows, makes you more productive, and keeps your code readable.
TalkPyData: Software Packages & Jupyter
Description
A few weeks ago I was working on setting up a relational database to explore records from DataSFâs Civic Art Collection. Whenever I attend a tech conference I try to spend a day or two in the city to check out its cultural scene, so this seemed like useful information! I decided to use MySQL as my database engine. Coming from a Pandas background I was surprised by how unproductive and restricted I felt writing raw SQL queries. I also spent a significant amount of time resolving errors in queries that worked with one flavor of SQL but failed with MySQL. Throughout the process, I kept thinking to myself if only there was a more Pythonic way!!! A few weeks later I was introduced to Ibis.
I live in Zimbabwe and the first thing that pops into my mind when I think of the word ibis is a safari. One of my favorite things to do when I'm not working is to go on a game drive. Whenever I've been adventuring on safari I usually see ibises perched on top of an elephant. The contrast between the creatures is stark! The African Sacred Ibis is a small, elegant creature that's named after the ancient Egyptian god Thoth. While as many of us know, an elephant is a very big and complex animal. This image serves as a great metaphor for the Python package and how it interacts with big database engines.
Ibis allows you to write intuitive Python code and have that code be translated into SQL. Whether youâre wanting to interact with SQL databases or wanting to use distributed DBMSs, Ibis lets you do this in Python. You can think of the python code as the less complex elegant layer sitting on top of any big data engine of your choice. At the moment, Ibis supports quite a few backends including:
Traditional DBMSs: PostgreSQL, MySQL, SQLite Analytical DBMSs: OmniSciDB, ClickHouse, Datafusion Distributed DBMSs: Impala, PySpark, BigQuery In memory analytics: pandas and Dask.
Anything you can write in an SQL select statement you can write in Ibis. You can carry out joins, filters, and other operations on your data in a familiar, Pandas-like syntax. In this talk, we'll go through several examples of these and compare what the SQL code would look like versus writing to the database with Ibis. Overall, using Ibis simplifies your workflows, makes you more productive, and keeps your code readable.