VM Consolidation – Survival Guide

VM Consolidation – Survival Guide

survival-guide

Survival guide for any vm snaphost consolidation problems all in one place :

Note! Make sure any backup software is turned off or that all jobs are stopped. A reboot of the backup server is required to clear any potential residual locks.

  1. Restart vc service – https://kb.vmware.com/kb/1003895
  1. Restart the management agents on the ESXi cluster where problematic vms are working

#services.sh restart   – https://kb.vmware.com/kb/1003490, or manually verify to determine “who” is holding the lock

3. Use vmfstools (-D) command against vm snapshot files:

/vmfs/volumes/<datastore># vmkfstools -D <file name>
You see an output similar to:

[root@test-esx1 testvm]# vmkfstools -D test-000008-delta.vmdk
Lock [type 10c00001 offset 45842432 v 33232, hb offset 4116480
gen 2397, mode 2, owner 00000000-00000000-0000-000000000000mtime 5436998]<————–MAC address of lock owner
RO Owner[0] HB offset 3293184 xxxxxxxx-xxxxxxxx-xxx-xxxxxxxxxxxx <——————————MAC address of read-only lock owner
Addr <4, 80, 160>, gen 33179, links 1, type reg, flags 0, uid 0, gid 0, mode 100600
len 738242560, nb 353 tbz 0, cow 0, zla 3, bs 2097152

//more information in kb: https://kb.vmware.com/kb/10051

If  esxi holding lock you can restart mgmt agents as per above advice or migrate all vms and reboot host or determine which process is holding the lock – just run one of these commands:

# lsof file

# lsof | grep -i file

For example:

# lsof | grep test02-flat.vmdk

You should see an output similar to:

COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME

71fd60b6- 3661 root 4r REG 0,9 10737418240 23533 Test02-flat.vmdk

Check the process with the PID returned in above, in our  example:

# ps -ef | grep 3661

to kill the process, run the command:

# kill

All in all when we solve “locks” problems we can continue vm consolidation process :

  1. Connect to the ESXi where is problematic vm directly
  1. Power off problematic vm
  1. Disable CBT for the virtual machine (very ofter ctk files are corrupt, for example we run backup job on vm with active snapshot – this is unsupported config) For more information, see: http://kb.vmware.com/kb/1031873

6.Remove  any files ending with the *-ctk.vmdk file extension in the virtual machine directory.

  1. Enable CBT for the virtual machine again, see: http://kb.vmware.com/kb/1031873
  1. Remove and add vm to inventory (just to verify vm configuration integrity, in case any vmx problems you got error message and you need correct vm config), more information in kb: https://kb.vmware.com/kb/1003743
  1. Create a snapshot:

Right-click the virtual machine.

Click Snapshot.

Click Take Snapshot.

  1. Perform a Delete All operation:

Right-click the virtual machine.

Click Snapshot.

Click Snapshot Manager.

Click Delete All.

TIP:  To verify snapshots are rejoining run the commands:

#watch “ls -lhut –time-style=full-iso *-delta.vmdk”

#watch “ls -lh –full-time *-delta.vmdk *-flat.vmdk”

//more info in kb: https://kb.vmware.com/kb/1007566

  1. Power on vm and verify fix

 

However if above do not work/solve the problem we have two alternate options:

  1. a) clone or storage vmotion problematic vm’s to different datastore
  1. b) use VMware converter and perform v2v operation

That’s it – my survival guide for any vm snapshot consolidation problems – wondering if you have any add ons or different approach view ?

Comments are closed.