Writing to a SQLite DB from Pandas

This note demonstrates writing from a Pandas dataframe to a SQLite database.

import pandas as pd
import sqlite3

Read CSV Data into a DataFrame

f = (pd.read_csv('writing-to-sqlite-db-from-pandas/fruit_data_with_colors.txt',
                 sep='\t')
     .rename(columns={'fruit_name':'fruit'}))
f.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 59 entries, 0 to 58
Data columns (total 7 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   fruit_label    59 non-null     int64  
 1   fruit          59 non-null     object 
 2   fruit_subtype  59 non-null     object 
 3   mass           59 non-null     int64  
 4   width          59 non-null     float64
 5   height         59 non-null     float64
 6   color_score    59 non-null     float64
dtypes: float64(3), int64(2), object(2)
memory usage: 3.4+ KB
f.head()

fruit_label fruit fruit_subtype mass width height color_score
0 1 apple granny_smith 192 8.4 7.3 0.55
1 1 apple granny_smith 180 8.0 6.8 0.59
2 1 apple granny_smith 176 7.4 7.2 0.60
3 2 mandarin mandarin 86 6.2 4.7 0.80
4 2 mandarin mandarin 84 6.0 4.6 0.79

Create the Database Connection and Write to It

%%bash
cd writing-to-sqlite-db-from-pandas
rm -f fruits.db
ls
fruit_data_with_colors.txt

The following line creates the SQLite DB file if it does not already exist.

conn = sqlite3.connect('writing-to-sqlite-db-from-pandas/fruits.db')
%%bash
cd writing-to-sqlite-db-from-pandas
ls
fruit_data_with_colors.txt
fruits.db
f.to_sql('fruits',             # Name of the sql table
         conn,                 # sqlite.Connection or sqlalchemy.engine.Engine
         if_exists='replace')  # If the table already exists, {‘fail’, ‘replace’, ‘append’}, default ‘fail’
conn.close()   # Closing the connection is best practice. This prevents the database from being locked.

Connect to the Database and Read from It

sql_string = '''SELECT * 
                FROM fruits'''
conn = sqlite3.connect('writing-to-sqlite-db-from-pandas/fruits.db')
f_out = pd.read_sql(sql_string,
                    conn,
                    index_col='index')
conn.close()
f_out.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 59 entries, 0 to 58
Data columns (total 7 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   fruit_label    59 non-null     int64  
 1   fruit          59 non-null     object 
 2   fruit_subtype  59 non-null     object 
 3   mass           59 non-null     int64  
 4   width          59 non-null     float64
 5   height         59 non-null     float64
 6   color_score    59 non-null     float64
dtypes: float64(3), int64(2), object(2)
memory usage: 3.7+ KB
f_out.head()

fruit_label fruit fruit_subtype mass width height color_score
index
0 1 apple granny_smith 192 8.4 7.3 0.55
1 1 apple granny_smith 180 8.0 6.8 0.59
2 1 apple granny_smith 176 7.4 7.2 0.60
3 2 mandarin mandarin 86 6.2 4.7 0.80
4 2 mandarin mandarin 84 6.0 4.6 0.79
%%bash
cd writing-to-sqlite-db-from-pandas
sqlite3 fruits.db
.tables
fruits
%%bash
cd writing-to-sqlite-db-from-pandas
sqlite3 fruits.db
.schema fruits
CREATE TABLE IF NOT EXISTS "fruits" (
"index" INTEGER,
  "fruit_label" INTEGER,
  "fruit" TEXT,
  "fruit_subtype" TEXT,
  "mass" INTEGER,
  "width" REAL,
  "height" REAL,
  "color_score" REAL
);
CREATE INDEX "ix_fruits_index"ON "fruits" ("index");