Categorias
Banco de dados PostGreSQL

Recover and replace win1252 content to utf-8 for a PostgreSQL database

I shouldn't be the first to need to export data from a PostgreSQL database installed on Windows with a win1252 code to a database with UTF-8 code (in my case, on a Linux server).

It is not enough to transform the transfer file to UTF-8, as the characters of win1252 (left double quote, right double quote, single quote and dash) will be there, with a weird value in your database.

In my experience, I had to import data as-is and afterwards perform an update using the function to rewrite data for UTF-8 characters correctly.

The following code examples are for: 1 - transform to HTML characters; 2 - transform to single characters.

HTML:

-- DROP FUNCTION substitui_win1252_html(texto);

CREATE OR REPLACE FUNCTION substitui_win1252_html(texto text)
    RETURNS text AS
$$
BEGIN

    texto := replace(replace(replace(replace(replace(texto, '’', '’'), '“', '“'), '”', '”'), '•','•'), '–', '–')::text; 
    RETURN texto;

END;
$$
LANGUAGE plpgsql;

Simple:

-- DROP FUNCTION substitui_win1252(texto);

CREATE OR REPLACE FUNCTION substitui_win1252(texto text)
    RETURNS text AS
$$
BEGIN

    texto := replace(replace(replace(replace(replace(texto, '’', ''''), '“', '"'), '”', '"'), '•','•'), '–', '-')::text;  
    RETURN texto;

END;
$$
LANGUAGE plpgsql;

Don't worry about the squares that you might see. If you copy it to a good text editor, you will their actual value.

Categorias
PostGreSQL

PostgreSQL – pg_upgrade from 10 to 12

I have some PostgreSQL databases running pretty well but we need to keep our software updated. This is a mandatory practice for a high-quality service. Those servers are running version 10 and they need to be upgraded to version 12. I have used pg_dump / pg_restore strategy for a long time, but this time I would rather use pg_upgrade.

Let's dive into how to do it.

Table of contents:

  1. Install
  2. InitDB
  3. Check upgrade consistency
  4. Set locale
  5. Upgrade
  6. Configurations

Install

The package postgresql12-server contains everything needed to run the server, but my databases use some extensions [1], then I will add postgresql12-devel and postgresql12-contrib to be able to compile and to install the extensions.

yum install postgresql12-server postgresql12-devel postgresql12-contrib

InitDB

After installation we need to setup new server with initdb:

~% /usr/pgsql-12/bin/postgresql-12-setup initdb
Initializing database … OK

Check upgrade consistency

We need to check compatibility. Turn to postgres user (su - postgres) and run the command:

~% /usr/pgsql-12/bin/pg_upgrade --old-bindir=/usr/pgsql-10/bin --new-bindir=/usr/pgsql-12/bin --old-datadir=/var/lib/pgsql/10/data --new-datadir=/var/lib/pgsql/12/data --check

Performing Consistency Checks on Old Live Server
------------------------------------------------
Checking cluster versions                                   ok
Checking database user is the install user                  ok
Checking database connection settings                       ok
Checking for prepared transactions                          ok
Checking for reg* data types in user tables                 ok
Checking for contrib/isn with bigint-passing mismatch       ok
Checking for tables WITH OIDS                               fatal

Your installation contains tables declared WITH OIDS, which is not supported
anymore. Consider removing the oid column using
    ALTER TABLE ... SET WITHOUT OIDS;
A list of tables with the problem is in the file:
    tables_with_oids.txt

Failure, exiting

As you may see, I got a fatal error, indicating that the upgrade is not possible. In my case, tables with OIDs are the culprit. In your case could be something else. In any case, we need to fix before upgrading.

I fixed tables removing OIDs on mentioned tables. And ran check again:

Performing Consistency Checks on Old Live Server
------------------------------------------------
Checking cluster versions                                   ok
Checking database user is the install user                  ok
Checking database connection settings                       ok
Checking for prepared transactions                          ok
Checking for reg* data types in user tables                 ok
Checking for contrib/isn with bigint-passing mismatch       ok
Checking for tables WITH OIDS                               ok
Checking for invalid "sql_identifier" user columns          ok
Checking for presence of required libraries                 ok
Checking database user is the install user                  ok
Checking for prepared transactions                          ok

*Clusters are compatible*

Yay!

Set locale

There is a tricky configuration that is not detected by pg_upgrade check but it is very important to me. I use C locale on my databases [2], then I need to perform an extra step. If this is your case, you may follow the same steps applying yours.

I need to stop postgresql10 and start postgresql12:

systemctl stop postgresql-10.service
systemctl start postgresql-12.service

Then I run locale change at my template1 then locale will be enabled when my database will be upgraded.

UPDATE pg_database SET datcollate='C', datctype='C' WHERE datname='template1';

And stop again: systemctl stop postgresql-12.service to be ready to upgrade.

Upgrade

Upgrade command is the same that we run before, without --check flag.

~% /usr/pgsql-12/bin/pg_upgrade --old-bindir=/usr/pgsql-10/bin --new-bindir=/usr/pgsql-12/bin --old-datadir=/var/lib/pgsql/10/data --new-datadir=/var/lib/pgsql/12/data

Performing Consistency Checks
-----------------------------
Checking cluster versions                                   ok
Checking database user is the install user                  ok
Checking database connection settings                       ok
Checking for prepared transactions                          ok
Checking for reg* data types in user tables                 ok
Checking for contrib/isn with bigint-passing mismatch       ok
Checking for tables WITH OIDS                               ok
Checking for invalid "sql_identifier" user columns          ok
Creating dump of global objects                             ok
Creating dump of database schemas
                                                            ok
Checking for presence of required libraries                 ok
Checking database user is the install user                  ok
Checking for prepared transactions                          ok

If pg_upgrade fails after this point, you must re-initdb the
new cluster before continuing.

Performing Upgrade
------------------
Analyzing all rows in the new cluster                       ok
Freezing all rows in the new cluster                        ok
Deleting files from new pg_xact                             ok
Copying old pg_xact to new server                           ok
Setting next transaction ID and epoch for new cluster       ok
Deleting files from new pg_multixact/offsets                ok
Copying old pg_multixact/offsets to new server              ok
Deleting files from new pg_multixact/members                ok
Copying old pg_multixact/members to new server              ok
Setting next multixact ID and offset for new cluster        ok
Resetting WAL archives                                      ok
Setting frozenxid and minmxid counters in new cluster       ok
Restoring global objects in the new cluster                 ok
Restoring database schemas in the new cluster
                                                            ok
Copying user relation files
                                                            ok
Setting next OID for new cluster                            ok
Sync data directory to disk                                 ok
Creating script to analyze new cluster                      ok
Creating script to delete old cluster                       ok

Upgrade Complete
----------------
Optimizer statistics are not transferred by pg_upgrade so,
once you start the new server, consider running:
    ./analyze_new_cluster.sh

Running this script will delete the old cluster's data files:
    ./delete_old_cluster.sh

Consider running analyze_new_cluster. Optional but nice to have.

vacuumdb: processing database "mydb": Generating minimal optimizer statistics (1 target)
vacuumdb: processing database "postgres": Generating minimal optimizer statistics (1 target)
vacuumdb: processing database "template1": Generating minimal optimizer statistics (1 target)
vacuumdb: processing database "mydb": Generating medium optimizer statistics (10 targets)
vacuumdb: processing database "postgres": Generating medium optimizer statistics (10 targets)
vacuumdb: processing database "template1": Generating medium optimizer statistics (10 targets)
vacuumdb: processing database "mydb": Generating default (full) optimizer statistics
vacuumdb: processing database "postgres": Generating default (full) optimizer statistics
vacuumdb: processing database "template1": Generating default (full) optimizer statistics

Done

Configurations

Before deleting your old cluster, remember to get some of your configurations.

mv /var/lib/pgsql/12/data/pg_hba.conf /var/lib/pgsql/12/data/pg_hba.conf.new
cp /var/lib/pgsql/10/data/pg_hba.conf /var/lib/pgsql/12/data/

I am not a big fan of writing directly to postgresql.conf file. Instead, I keep configuration files under version control and include the directory where those files are deployed. I treat them as code, then it becomes easier to maintain and manage.

Another advantage is that I don't have any mess about config file differences when a new version arises. I am automatically using new default configurations, my customized setting are loaded and I can quickly address any incompatibility caused by migration without touching the original conf file.

Let's go for it:

# Add settings for extensions here
include_dir = '/var/lib/pgsql/conf.d'   # include files ending in '.conf' from

Then you may delete your old cluster. 🙂

Links:

  1. Extensions: https://www.postgresql.org/docs/current/contrib.html
  2. Locale: https://www.postgresql.org/docs/current/locale.html
Categorias
Linux PostGreSQL

Backup de todos os bancos do servidor PostgreSQL

Extremamente útil, ainda mais quando há possibilidade de criação de banco de dados de forma automatizada (servidores de hospedagem, por exemplo).

Download

Categorias
PostGreSQL

Gerando 1000 linhas de dados aleatórios – PostgreSQL

Muito útil!

SELECT i, clock_timestamp() t1, random() r1, random() r2, random() r3, clock_timestamp() + (round(random()*1000)::text || ' days')::interval d1 
from generate_series(1,1000) i(i);
Categorias
PostGreSQL

Substituir conteúdo win1252 para utf-8

Não devo ser o primeiro a precisar exportar dados de um banco PostGreSQL instalado em Windows com codificação win1252 para um banco com codificação em utf-8 (no meu caso, em servidor Linux).

Não basta transformar o arquivo de importação para utf-8, pois os caracteres do win1252 (aspas duplas à esquerda, aspas duplas à direita, aspa simples e travessão) estarão lá, com um valor esquisito no seu banco. A minha solução foi importar assim mesmo e depois realizar um update usando uma função para corrigir.

Os exemplos de código a seguir são para: 1 - transformar para caracteres HTML; 2 - transformar para os caracteres simples.

HTML:

Simples:

Não se preocupe com os quadrados que aparecem. Se você copiar para um bom editor de texto, verá que possuem valores diferentes.