Archive for February, 2016

PostgreSQL database import with Ansible

I had a hard time pulling together all the steps needed to import a PostgreSQL database using Ansible. Here’s the Ansible YAML blocks used to import the seed database for Lexiio.

1. Install PostgreSQL - name: Install Postgres
apt: name={{ item }} update_cache=yes cache_valid_time=3600 state=present
sudo: yes
with_items:
- postgresql
- postgresql-contrib
- libpq-dev
- python-psycopg2
tags: packages

2. Create the database (lexiiodb), UTF-8 for encoding and collation

- name: Create lexiiodb database
sudo_user: postgres
postgresql_db: name=lexiiodb encoding='UTF-8' lc_collate='en_US.UTF-8' lc_ctype='en_US.UTF-8' state=present

3. Create a role that will be granted access to the database (password is a variable read from some secret source)

- name: Create lexiio role for database
sudo_user: postgres
postgresql_user: db=lexiiodb user=lexiio password="{{ password }}" priv=ALL state=present

4. Start the PostgreSQL service

- name: Start the Postgresql service
sudo: yes
service:
name: postgresql
state: started
enabled: true

5. Import data into the database (using psql to pull in data from /home/lexiiodb.dump.sql)

- name: Importing lexiiodb data
sudo_user: postgres
shell: psql lexiiodb < /home/lexiiodb.dump.sql

6. For the role created, grant permissions on all schemas in the DB

- name: Grant usage of schema to lexiio role
sudo_user: postgres
postgresql_privs: database=lexiiodb state=present privs=USAGE type=schema roles=lexiio objs=dictionary

7. For the role created, grant permissions on all tables in the DB

- name: Grant table permissions for lexiio role
sudo_user: postgres
postgresql_privs: database=lexiiodb schema=dictionary state=present privs=SELECT,INSERT,UPDATE type=table roles=lexiio grant_option=no objs=ALL_IN_SCHEMA

8. For the role created, grant permissions on all sequences in the DB

- name: Grant sequence permissions for lexiio role
sudo_user: postgres
postgresql_privs: database=lexiiodb schema=dictionary state=present privs=USAGE type=sequence roles=lexiio grant_option=no objs=ALL_IN_SCHEMA

A more relational dictionary

As I started looking to add more functionality to Lexiio, I realized the Wiktionary definitions database dump I was using wasn’t going to cut it; specifically, I needed a normalized schema, or I’d have data duplication all over the place. I started normalizing in MySQL, but whether it was MySQL or MySQL Workbench, I kept running into character encoding issues. Using a simple INSERT-SELECT, in MySQL 5.7, to transfer words from the existing table to a new table resulted losing characters:

MySQL losing characters

I dumped the data into PostgreSQL, didn’t encounter the issue, and just kept working from there.

The normalized schema can be downloaded here: LexiioDB normalized
(released under the Creative Commons Attribution-ShareAlike License)

LexiioDB schema

The unknown_words and unknown_to_similar_words tables is specific to Lexiio and serve as a place to store unknown words entered by the user and close/similar matches to known words (via the Levenshtein distance).