Things I learned while restoring multi-million rows table in mysql

By in articles, Misc., Server Admin on September 2, 2011

Recently someone contacted me to restore a huge database to their new servers.The database dump file was about 5GB having around 13 millions records.Each row included a huge TEXT field.
The new server was a dedicated server from Godaddy.
The initial step was not so hard. It is a good practice to disable indexes while importing huge database and when all the rows are done inserting enable the indexes. all the command to do this are already included in the dump file.
So i run the following command from the shell.

$ mysql -u dbuser -p dbpass --database=dbname < /path/to/dump/file

It sure was going to take too long and I could not be waiting for the prompt to return.Also I needed to make sure that the process continues even after I disconnect from the terminal.
So I used nohup so that it continues to execute:

$ nohup mysql -u dbuser -p dbpass --database=dbname < /path/to/file &


Waited for about 7 hours for the records to be inserted into the table. Also when a table is being written into, the table is locked. UPDATE and INSERT queries have higher priority than SELECT. So when the above command was executing I could not check how many records was updated using the SELECT query.
One way to make select work was making this query in mysql client:

mysql > SELECT HIGH_PRIORITY COUNT(*) FROM table;

This sets the priority of SELECT above INSERT and UPDATE. But this was not working for me.
so I did the following to check how many records have been inserted so far.

mysql > SHOW PROCESSLIST;

This is a mysql command that shows the processes being executed. This showed the INSERT statement that was being executed at the moment. I could see the rowid in the query. That tells how many records have already been inserted.
I set a watch command to check this every 60 seconds. Wrote a small file “watchdb” with this code

mysql -u user -p pass --database=dbname -e 'SHOW PROCESSLIST;'

And execute following commands

$ chmod +x watchdb
$ watch -d -n 60 ./watchdb

So far around 100000 records were being inserted a minute.

The really sluggish process came when the records were finished being inserted. Now the indexes were to be enabled.I could see in processlist the table was being repaired, but using key cache.
Sorting MyISM tables using keycache is a very sluggish process.
I had to make it use file sort so that i could get it done in short time.
You can read more about difference between “repair using keycache” and “file sort” here.
So I killed the repair process. And set the myisam_max_sort_file_size. And restart the repair.

mysql> SET GLOBAL myisam_max_sort_file_size=100000 * 1024 * 1024;
mysql> REPAIR TABLE table;

Now it uses file sort and the process finished in less than 10 minutes.
So around 13 millions records inserted and re-indexed in around 7 hours time. I feel like the insert process took it long to execute. May be because Godaddy servers dont allow you to run long processes.
Then updated the /etc/my.cnf file so that next time file sort will be used. add following lines to my.cnf:

myisam_sort_buffer_size=256M
key_buffer_size=1G
myisam_max_sort_file_size=100G

and restart mysql.
You can read more about optimising your mysql server in this discussion here.

Tags: , , , , ,

2 Responses to “Things I learned while restoring multi-million rows table in mysql”

  1. Sucl says:

    Nice article. Keep em coming :)

  2. budopunk says:

    gud work there with the information……….;)

Leave a Reply