Cockroach is an open source implementation along the lines of Google Spanner. I wanted to try it out as described in Getting cockroachdb running on google cloud platform , so I wanted to see if it was possible to automate the import of a sizeable MySql database from a DDL file.
DifferencesThere are a whole lot of syntactical differences, so the input data needed lots of massaging. Cockroach is close to PostgreSQL so I started from this .
I also spent a bit of time on the issues section of the cockroach repo and picked up many edits that others had suggested there too.
ToolsI've been using Unix off and on for almost 40 years. In fact, I used to work at the company that invented it, and the philosophy in the wikipedia entry for Unix, is one of the main reasons for the success of *x in all its flavours, including Linux."the use of a large number of software tools, small programs that can be strung together through a command-line interpreter using pipes, as opposed to using a single monolithic program that includes all of the same functionality. These concepts are collectively known as the "Unix philosophy". Brian Kernighan and Rob Pike summarize this in The Unix Programming Environment as "the idea that the power of a system comes more from the relationships among programs than from the programs themselves". "
Having said that, my relationship with Unix/Linux has been very much more off than on, so embarking on this automation task has been both a "remembering experience" and a "learning experience", but in any case i wanted to stick to the basic tools available in Linux, namely bash , grep, awk and sed. Let's hope that some of this might be useful to others undertaking the same task.
CockroachDBI covered this in Getting cockroachdb running on google cloud platform , but initially I'm using Docker running on Compute Engine - just a single VM, but multiple nodes in separate Docker containers.
Converting and loading the ddlThere are 3 parts to this
load.sh runs the entire process
Splitting the ddlsplitpart.sh just passes the input and output filenames to alters.sh. The original mySql export is in e.sql-orig
alters.sh creates the separate parts as follows.
Patching the syntaxnow we have a table-creates.sql, a table-alters.sql and a set of inserts, but they are still in MySql syntax so the next step is to patch them to valid cockroachsyntax
patpart.sh does this
patcdb.sh converts from mysql to cockroach - most of these edits came from the issues section of cockroach git.
patinserts.sh - MySql surrounds names with backticks ( I realized later there was a mysql export option to prevent this, but quoting names in cockroach preserves the case, so I decided to leave this in). Cockroach doesn't like dates expressed as 0000-00-00, so best to change these to NULL.
patbools.sh - I'm using the option in MySql that sends across binary values as hex encoded. In the case of BOOL, they would have been defined as BIT(1) in MySQL, so 0x00 and 0x01 need to be converted to true or false.
Loading the data Now there's a table and alters sql file to build the database and add the contraints, along with a collection of files, one for each table, containing the values to be inserted. dockpart.sh is in 2 parts.
Splitting the ddl
sh patcdb.sh e-table-creates.sql-unpatched e-table-creates.sqlsh patcdb.sh e-table-alters.sql-unpatched e-table-alters.sqlrm inserts/*e-inserts.sqlF=inserts/*e-inserts.sql-unpatched
sed -E -e 's/`/"/g' \
sed -E -e 's/`/"/g' -e "s/\\\'/''/g" -e "s/\),/\),\n/g" $1 > $2
sed -E -e 's/`/"/g' \ -e "s/float\([0-9,]+\)/FLOAT/g" \ -e "s/binary\([0-9]+\)/BYTES/Ig" \ -e 's/int\(.*\)/int/' \ -e 's/DEFAULT CHARSET=ascii//' \ -e 's/ENGINE=InnoDB //' \ -e 's/DEFAULT CHARSET=utf8//' \ -e 's/DEFAULT NULL//' \ -e 's/^LOCK/--LOCK/' \ -e 's/^UNLOCK/--UNLOCK/' \ -e 's/COMMENT=.*$/;/' \ -e 's/COMMENT\ .*$/,/' \ -e 's/int NOT NULL AUTO_INCREMENT/SERIAL NOT NULL/' \ -e 's/AUTO_INCREMENT=.* //' \ -e 's/COLLATE utf8_unicode_ci//' \ -e 's/COLLATE=utf8_unicode_ci//' \ -e '/PRIMARY/!s/KEY/index/' \ -e 's/tinyint/SMALLINT/' \ -e 's/int NOT NULL DEFAULT \x270\x27/int NOT NULL DEFAULT 0/' \ -e 's/USING BTREE//' \ -e 's/\x27\\\x27\x27/e\x27\\x27\x27/g' \ -e 's/mediumtext/text/g' \ -e 's/FOREIGN index/FOREIGN KEY/g' \ -e 's/bit\(1\)/BOOL/g' \ -e "s/b'0'/false/g" \ -e "s/b'1'/true/g" \ -e 's/datetime/TIMESTAMP/g' \ -e 's/tinyblob/BYTES/g' \ -e 's/ double / DOUBLE PRECISION /g' \ -e "s/timestamp NOT NULL DEFAULT '0000-00-00 00:00:00'/TIMESTAMP/g" \ -e 's/ ON UPDATE CURRENT_TIMESTAMP//g' \ -e 's/ CHARACTER SET ascii //g' \ -e 's/ int unsigned NOT NULL AUTO_INCREMENT/ BIGSERIAL NOT NULL/g' \ $1 > $2 -e "s/float\([0-9,]+\)/FLOAT/g" \
-e "s/binary\([0-9]+\)/BYTES/Ig" \
-e 's/int\(.*\)/int/' \
-e 's/DEFAULT CHARSET=ascii//' \
-e 's/ENGINE=InnoDB //' \
-e 's/DEFAULT CHARSET=utf8//' \
-e 's/DEFAULT NULL//' \
-e 's/^LOCK/--LOCK/' \
-e 's/^UNLOCK/--UNLOCK/' \
-e 's/COMMENT=.*$/;/' \
-e 's/COMMENT\ .*$/,/' \
-e 's/int NOT NULL AUTO_INCREMENT/SERIAL NOT NULL/' \
-e 's/AUTO_INCREMENT=.* //' \
-e 's/COLLATE utf8_unicode_ci//' \
-e 's/COLLATE=utf8_unicode_ci//' \
-e '/PRIMARY/!s/KEY/index/' \
-e 's/tinyint/SMALLINT/' \
-e 's/int NOT NULL DEFAULT \x270\x27/int NOT NULL DEFAULT 0/' \
-e 's/USING BTREE//' \
-e 's/\x27\\\x27\x27/e\x27\\x27\x27/g' \
-e 's/mediumtext/text/g' \
-e 's/FOREIGN index/FOREIGN KEY/g' \
-e 's/bit\(1\)/BOOL/g' \
-e 's/datetime/TIMESTAMP/g' \
-e 's/tinyblob/BYTES/g' \
-e 's/ double / DOUBLE PRECISION /g' \
-e "s/timestamp NOT NULL DEFAULT '0000-00-00 00:00:00'/TIMESTAMP/g" \
-e 's/ ON UPDATE CURRENT_TIMESTAMP//g' \
-e 's/ CHARACTER SET ascii //g' \
-e 's/ int unsigned NOT NULL AUTO_INCREMENT/ BIGSERIAL NOT NULL/g' \
$1 > $2
inserts/$b-e-inserts.sql-bytesin inserts/$b-e-inserts.sql rm inserts/$b-e-inserts.sql-bytesindone#BYTES need very special treatment#B=$(awk 'BEGIN {RS=";"}/BYTES/{print}' e-table-creates.sql | grep -i "CREATE TABLE" | grep -iEo '".*"' | grep -iEo '[^"]*')#echo "patching byte inserts"#for b in $B#do #need to copy the patched one, as some may be both bool and bytes #cp inserts/$b-e-inserts.sql inserts/$b-e-inserts.sql-bytesin #sh patbytes.sh inserts/$b-e-inserts.sql-bytesin inserts/$b-e-inserts.sql #rm inserts/$b-e-inserts.sql-bytesin#done
rm inserts/*e-inserts.sqlF=inserts/*e-inserts.sql-unpatched
sh patcdb.sh e-table-creates.sql-unpatched e-table-creates.sqlsh patcdb.sh e-table-alters.sql-unpatched e-table-alters.sqlrm inserts/*e-inserts.sqlF=inserts/*e-inserts.sql-unpatchedfor fx in $Fdo P=$(echo $fx | grep -Eo ".*\.sql") sh patinserts.sh $fx $PdoneB=$(awk 'BEGIN {RS=";"}/BOOL/{print}' e-table-creates.sql | grep -i "CREATE TABLE" | grep -iEo '".*"' | grep -iEo '[^"]*')echo "patching bool inserts"for b in $Bdo #need to copy the patched one, cp inserts/$b-e-inserts.sql inserts/$b-e-inserts.sql-bytesin sh patbools.sh inserts/$b-e-inserts.sql-bytesin inserts/$b-e-inserts.sql rm inserts/$b-e-inserts.sql-bytesindone#BYTES need very special treatment#B=$(awk 'BEGIN {RS=";"}/BYTES/{print}' e-table-creates.sql | grep -i "CREATE TABLE" | grep -iEo '".*"' | grep -iEo '[^"]*')#echo "patching byte inserts"#for b in $B#do #need to copy the patched one, as some may be both bool and bytes #cp inserts/$b-e-inserts.sql inserts/$b-e-inserts.sql-bytesin #sh patbytes.sh inserts/$b-e-inserts.sql-bytesin inserts/$b-e-inserts.sql #rm inserts/$b-e-inserts.sql-bytesin#done
# this splits it up into Create, constrainsts and inserts# 1 .. intput filr# 2 .. table definition# 3 .. constraints# 5 .. insert fileegrep -Ei "CREATE TABLE[^\`]+" $1 | grep -oEi "\`[^\`]+\`" | grep -iEo '[^\`]*' > e-t.u
DB=$(grep -iE "CREATE DATABASE" $1 | grep -oiE "\`.*+\`" | grep -oiE "[^\`]+") sh patcdb.sh e-table-creates.sql-unpatched e-table-creates.sqlsh patcdb.sh e-table-alters.sql-unpatched e-table-alters.sql # redefine the tables without the constraints echo "creating $2
"echo "" > $2;
while read LINE
do
awk "BEGIN{IGNORECASE=1} /CREATE TABLE \`$LINE\`/,/;/" $1 | grep -iEv "constraint" | sed -Ee 'N;/\n\)/s/,\)*\n/\n/;P;D' >> $2
done < "e-t.u"
# inserts
rm "inserts/*-$4"
echo "creating $4"
echo "" > $3;
while read LINE
do N="inserts/$LINE-$4"
echo "creating $N"
awk "BEGIN{IGNORECASE=1} /INSERT INTO \`$LINE\`/,/;/" $1 | sed -E -e "s/INSERT INTO \`$LINE\` VALUES//" > $N
done < "e-t.u"
# create an alters sql
echo "creating $3"
echo "" > $3;
while read LINE
do
awk "BEGIN{IGNORECASE=1} /CREATE TABLE \`$LINE\`/,/;/" $1 | grep -iE \
"constraint" | sed \
-Ee "s/CONSTRAINT/ALTER TABLE \`$LINE\` ADD CONSTRAINT/Ig" | sed -Ee "N;/\n\)/s/,\)*\n/\n/;P;D" | sed -Ee "s/$/;/g" >> $3
done < "e-t.u"
# because a ddl file could be in the wrong order for dependenices# this splits it up into Create, constrainsts and inserts# 1 .. intput filr# 2 .. table definition# 3 .. constraints# 5 .. insert fileegrep -Ei "CREATE TABLE[^\`]+" $1 | grep -oEi "\`[^\`]+\`" | grep -iEo '[^\`]*' > e-t.uDB=$(grep -iE "CREATE DATABASE" $1 | grep -oiE "\`.*+\`" | grep -oiE "[^\`]+")## create an alters sqlecho "creating $3"echo "" > $3;while read LINE do awk "BEGIN{IGNORECASE=1} /CREATE TABLE \`$LINE\`/,/;/" $1 | grep -iE \ "constraint" | sed \ -Ee "s/CONSTRAINT/ALTER TABLE \`$LINE\` ADD CONSTRAINT/Ig" | sed -Ee "N;/\n\)/s/,\)*\n/\n/;P;D" | sed -Ee "s/$/;/g" >> $3 done < "e-t.u"# redefine the tables without the constrainstecho "creating $2"echo "" > $2;while read LINE do awk "BEGIN{IGNORECASE=1} /CREATE TABLE \`$LINE\`/,/;/" $1 | grep -iEv "constraint" | sed -Ee 'N;/\n\)/s/,\)*\n/\n/;P;D' >> $2 done < "e-t.u"# insertsrm "inserts/*-$4"echo "creating $4"echo "" > $3;while read LINE do N="inserts/$LINE-$4" echo "creating $N" #echo "DELETE FROM \`$LINE\` WHERE true;" > $N awk "BEGIN{IGNORECASE=1} /INSERT INTO \`$LINE\`/,/;/" $1 | sed -E -e "s/INSERT INTO \`$LINE\` VALUES//" > $N done < "e-t.u"
# create separate alter and table filesh alters.sh e.sql-orig \ e-table-creates.sql-unpatched \ e-table-alters.sql-unpatched \ e-inserts.sql-unpatched#split the file into creation.constraint.insertssh splitpart.sh#patch the syntaxsh patpart.sh#load to dbsh dockpart.sh
|
Services > Desktop Liberation - the definitive resource for Google Apps Script and Microsoft Office automation > Google cloud platform >