Browsed by
Tag: VMware

VMware Virtual SAN 6.6 what’s new

VMware Virtual SAN 6.6 what’s new

1vsan

vSAN 6.6 it’s 6th generation of the product and there are more than 20+ new features and enhancements in this release, such as:

  • Native encryption for data-at-rest
  • Compliance certifications
  • Resilient management independent of vCenter
  • Degraded Disk Handling v2.0 (DDHv2)
  • Smart repairs and enhanced rebalancing
  • Intelligent rebuilds using partial repairs
  • Certified file service & data protection solutions
  • Stretched clusters with local failure protection
  • Site affinity for stretched clusters
  • 1-click witness change for Stretched Cluster
  • vSAN Management Pack for vRealize
  • Enhanced vSAN SDK and PowerCLI
  • Simple networking with Unicast
  • vSAN Cloud Analytics with real-time support notification and recommendations
  • vSAN Config Assist with 1-click hardware lifecycle management
  • Extended vSAN Health Services
  • vSAN Easy Install with 1-click fixes
  • Up to 50% greater IOPS for all-flash with optimized checksum and dedupe
  • Support for new next-gen workloads
  • vSAN for Photon in Photon Platform 1.1
  • Day 0 support for latest flash technologies
  • Expanded caching tier choice
  • Docker Volume Driver 1.1

 

… ok now lets review main enhancements:

vSAN 6.6 introduces the industry’s first native HCI security solution. vSAN will now offer data-at-rest encryption that is completely hardware-agnostic. No more concern about someone walking off with a drive or breaking in to a less-secure, edge IT location and stealing hardware. Encryption is applied at the cluster level, and any data written to a vSAN storage device, both at the cache layer and persistent layer can now be fully encrypted.  And vSAN 6.6 supports 2-factor authentication, including SecurID and CAC.

2vsan

Certified file services and data protection solutions are available from 3rd party partners in the VMware Ready for vSAN Program to enable customers to extend and complement their vSAN environment with proven, industry-leading solutions. These solutions provide customers with detailed guidance on how to complement vSAN. (EMC NetWorker is avaialble today with new solutions coming on soon)

3vsan

vSAN stretched cluster was released in Q3’15 to provide an Active-Active solution. vSAN 6.6 adds a major new capability that will deliver a highly-available stretched cluster that addresses the highest resiliency requirements of data centers. vSAN 6.6 adds support for local failure protection that can provide resiliency against both site failures and local component failures.

4vsan

PowerCLI Updates: Full featured vSAN PowerCLI cmdlets enable full automation that includes all the latest features. SDK/API updates also enable enterprise-class automation that brings cloud management flexibility to storage by supporting REST APIs.

VMware vRealize Operations Management Pack for vSAN released recently, provides customers with native integration for simplified management and monitoring. The vSAN management pack is specifically designed to accelerate time to production with vSAN, optimize application performance for workloads running on vSAN and provide unified management for the Software Defined Datacenter (SDDC). It provides additional options for monitoring, managing and troubleshooting vSAN along with the end-to-end infrastructure solutions.

5vsan

Finally, vSAN 6.6 is well suited for next-generation applications. Performance improvements, especially when combined with new flash technologies for write-intensive applications, enable vSAN to address more emerging applications like Big Data. The vSAN team has also tested and released numerous reference architectures for these types of solutions, including Big Data, Splunk and InterSystems Cache.

RESOURCES:

  • Splunk Reference Architecture: http://www.emc.com/collateral/service-overviews/h15699-splunk-vxrail-sg.pdf
  • Citrix XenDestkop/XenApp Blog: https://blogs.vmware.com/virtualblocks/2017/02/27/citrix-xenapp-xendesktop-7-12-vmware-vsan-6-5-flash/
  • vSAN, VxRail and Pivotal Cloud Foundry RA: https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/products/vsan/vmware-pcf-vxrail-reference-architeture.pdf
  • vSAN and InterSystems Blog: https://community.intersystems.com/post/intersystems-data-platforms-and-performance-%E2%80%93-part-8-hyper-converged-infrastructure-capacity
  • Intel, vSAN and Big Data Hadoop: https://builders.intel.com/docs/storagebuilders/Hyper-Converged_big_data_using_Hadoop_with_All-Flash_VMware_vSAN.pdf

 

 

vCenter 6.5 DSN permissions

vCenter 6.5 DSN permissions

Recently we had some strange problems with our 6.5 lab vCenter (Windows version with MSSQL Server db), which frequently crashed. After some digging in vpxd logs it seem to be related to vc db permissions:

17-05-28T19:36:53.443+02:00 error vpxd[05420] [Originator@6876 sub=Default] [VdbStatement] SQLError was thrown: “ODBC error: (42000) – [Micrsoft][SQL Server Native Client 11.0][SQL Server]VIEW SERVER STATE permission was denied on object ‘server’, database ‘master’.” is returned when executing SQL statement “SELECT  DB_NAME(mf.DATABASE_ID) Db_Name,            CASE mf.FILE_ID WHEN 1 THEN ‘DATA’                            WHEN 2 THEN ‘LOG’            END File_Type,            vol.VOLUME_MOUNT_POINT AS Drive,            CONVERT(INT,vol.AVAILABLE_BYTES/1048576.0) FreeSpaceInMB,            (mf.SIZE*8)/1024 VCDB_Space_Mb,             mf.PHYSICAL_NAME Physical_Name,             SERVERPROPERTY(‘edition’) Sql_Server_Edition,             SERVERPROPERTY(‘productversion’) Sql_Server_Version            FROM            SYS.M” action.

The SQL execution is failing as the vCenter Server database user has no permisss on ‘master’ database, to resolve this issue, grant additional privileges to the vCenter Server database user:

use master
go
grant VIEW SERVER STATE to [vCenter_database_user]
go
GRANT VIEW ANY DEFINITION TO [vCenter_database_user]
go

 

vCenter Appliance 6.0 U3 email notifications are not sent when multiple email addresses are defined in an alarm action

vCenter Appliance 6.0 U3 email notifications are not sent when multiple email addresses are defined in an alarm action

Recently I tried to configure email notifications on my lab vCenter Server Appliance (6.0u3), but  experience issue:

 “Diagnostic-Code: SMTP;550 5.7.60 SMTP; Client does not have permissions to send as this sender”

I tried to use solution from kb: https://kb.vmware.com/kb/2075153 but apparently, the solution does not work with latest 6.0.x appliance!

After some research and digging deeper (header analysis ), it seems that root cause was invalid return path in the email header. To resolve this you need to edit two system files:

1. SSH to VCSA and enable shell:

#Command>shell.set –enabled True

# Command>shell

2. Open catalog : /etc/sysconfig

mail1

3. Edit “mail” using vi and made a change as in below prtsc:

#vi email

mail2

  • simply check using cat:

mail3

4. In the same catalog edit “sendmail” file adding a domain name “SENDMAIL_GENERICS_DOMAIN=”:

mail4

5. Subsequently, go to /etc/mail catalog and add a user to mask root in “genericstable”:

mail56. Regenerate table:

# makemap -r hash /etc/mail/genericstable.db < /etc/mail/genericstable

7. create file sendmail.mc:

#/sbin/conf.d/SuSEconfig.sendmail -m4 > /sendmail.mc

Note. Do not edit file “sendmail” like in abowe procedure

8. Double check if “sendmail.cf” file in catalog /etc exist if yes then change it a name:

   #mv /etc/sendmail.cf /etc/sendmail.cf.orig

9. Create a new config file:

#m4 /sendmail.mc > /etc/sendmail.cf

10. Open config file “sendmail.cf” (vi) and add IP SMTP/Exchange (DS[xxx.xxx.xxx.xxx] ) server in environment :

mail611. Restart sendmail service:

# /etc/init.d/sendmail restart

 

Now it should work fine !

vRealize Automation 7 Installation – minimal deployment

vRealize Automation 7 Installation – minimal deployment

Ones might find it quite difficult to installa new product for the first time. Even though they installed previous version in the past (or particularly because of that!) That’s due to changes in requirements or event different structures of wizard form which requires information in different format. E.g domain administator accound in one version is required in administrator@domain.local format whils in other installation it accepts only domain\administrator format.

Thanks to that I’m going to provide the list of basic steps to install vRealize Automation 7 in minimal deployment.

I assume that you have successfully deployed the vRA appliance and prepared IaaS VM.

  1. I suggest to perform the initial configuration from IaaS VM which will be helpful during IaaS components installation during the process. After accessing the vRA appliance console at port 5480 providing default password, the wizard will start. At this stage you can choose the deployment type. Below you can see description of enterprise deployment. I will choose minimal deployment for now, though. It’s

vrainstall2 vrainstall3

2. Now it’s high time to prepare IaaS server for installation of IaaS components. In vRA 7.x it is much more simplier and faster process. You simply need to install the agent to let the wizard  discover the server and perform necessary steps then.vrainstall4

3. Here you can see that the server is discovered and you can move forward.vrainstall5

4. Next step is to check all the prerequisities and fix them if necessary.vrainstall6

5. Here you can see that there is a lot work to do on a freshly  installed Windows Server.vrainstall7

6. Furtunatelly the wizard will do the job by itself. That’s the time for short break 😉 It takes approximately 15 minutes to perform all steps.vrainstall87. When all tasks are done you can re-run the verification script to confirm that everything is fine and move to the next step.

vrainstall9

8. Provide the FQDN of vRA VM.vrainstall10

9. Here you are going to create a password which will be used by the system administrator account.

vrainstall11

10. Next step is to provide IaaS information, and here is the time to use domain\username format for IaaS Administrator account which should have local admin rights assigned.vrainstall12

11. Here you are going to provide information about the database server. I highly recommend to create new database. Keep in mind that IaaS administrator name which you provided in the previous step must have Sysadmin rights on Database Server. Without that the process will go forward. It will fail during the ending installation step, though. 
vrainstall14

12. For minimal or PoC deployment you can simply leave default values here.vrainstall15

13. The same as previous steps, leave it without changing. Just note the exact agent name which will be required during endpoint creation.vrainstall16

14. Provide the information for vRA self-signed certificates, untill you want to use custom ones.vrainstall17

15.  Provide the information for Web self-signed certificates, untill you want to use custom ones.

vrainstall18

16. Provide information for Manager service certificate.vrainstall19

17. Run the validation which might take about 10 minutes.
vrainstall21

18. Now it’s the time for some kind of backup in case something went wrong. I suggest to take snapshot of IaaS server just in case. Even though validation stage was completed successfully there are some issues which could happen during installation. (I’ve faced a failed installation due to lack of appropriate database permissions)vrainstall22

19. Next just start the installation.
vrainstall2320. If you didn’t miss anything your installation should be successfull 🙂
vrainstall25

21. Next you should provide the license key.vrainstall26

22. Deselect the VMware CEIP agreement checkbox.vrainstall27

23. You can also provide the password for initial config content. It’s expecially usefull in case of PoC installation. By clicking Create Initial Content button a blueprint for default automation on basic vRA configuration will be created and published in default tenant service catalog.

vrainstall28

24. After that you can will see the installation confirmation. Now you can start the play and begin the jurney with you vRA!
vrainstall29

Howto – Using Gmail as an Email server in vRalize Automation

Howto – Using Gmail as an Email server in vRalize Automation

Sometimes it is not possible to use corporation e-mail or deploy dedicated e-mail server expecially when it comes to a lab environment (and you are a little bit lazy like me to do it ;)). A workaround which I found is to configure my personal/fictional accounts profided by Gmail. It is more than enough in case you just want to see how notification or approval workflows work. It might not be enough if you want to deploy a little big bigger deployent with a few Business Groups and users, though. Of course there are plenty of other ways or small mail servers which you can deploy in a few minutes. I find gmail much more intuitive, though.

Anyway I’m going to show you how should Inbound and Outbound servers configurations look like.

Keep in mind the the outboud server is used to send notifications from vRA to users/managers etc. Inbound server is used to receive special kinds of notification by vRA from users. For instance when you are a Business Group Manager and you want to approve a request via e-mail without opening you vRA portal you can simply answer to the e-mail notification you received by clicking the hyperlink provided in that message. In this case you need to Inbound server configured.

Outboud Email configuration:

outbound

Inbound Email configuration:

Inbound

 

Note. Be aware that you have to change restiction policy on you Gmail account – you will receive an e-mail with detailed steps displaying the setting to change it after you click Test Connection in vRA’s e-mail configuration window.

ESXi and Likewise – troubleshooting guide – part 2

ESXi and Likewise – troubleshooting guide – part 2

In last part of this small series, we discussed theoretical background about components and technology related for adding ESX host to windows AD environment. Now it is time to describe troubleshooting options and some real life problems with solutions.

Let’s start from dividing all ESXi/Likewise issues into categories:

  1. Domain Join Failures

Here are most often reasons that an attempt to join a domain fails:

  • The user name or password of the account used to join the domain is incorrect.
  • The name of the domain is mistyped.
  • The name of the OU is mistyped.
  • The local hostname is invalid.
  • The domain controller is unreachable from the client because of a firewall or because the NTP service is not running on the domain controller.
  • Verify that the Name Server Can Find the Domain

# nslookup <AD Domain>

  • Make Sure the Client Can Reach the Domain Controller

verify that ESX host can reach the domain controller by pinging it.

  • Verify that Outbound Ports Are Open
  • Port 88 – Kerberos authentication
  • Port 123 – NTP
  • Port 135 – RPC
  • Port 137 – NetBIOS Name Service
  • Port 139 – NetBIOS Session Service (SMB)
  • Port 389 – LDAP
  • Port 445 – Microsoft-DS Active Directory, Windows shares (SMB over TCP)
  • Port 464 – Kerberos – change/password changes
  • Port 3268- Global Catalog search
  • Check DNS Connectivity

make sure the nameserver entry in /etc/resolv.conf contains the IP address of a DNS server that can resolve the name of the domain you are trying to join.

  • Make Sure nsswitch.conf Is Configured to Check DNS for Host Names

The /etc/nsswitch.conf file must contains the following line:

hosts: files dns

  • Ensure that DNS Queries Are Not Using the Wrong Network Interface Card

If the ESX host is multi-homed, the DNS queries might be going out the wrong network interface card. Temporarily disable all the NICs except for the card on the same subnet as your domain controller or DNS server and then test DNS lookups to the AD domain. If this works, re-enable all the NICs and edit the local or network routing tables so that the AD domain controllers are accessible from the host.

  • Determine Whether the DNS Server Is Configured to Return SRV Records

Your DNS server must be set to return SRV records so the domain controller can be located. It is common for non-Windows (bind) DNS servers to not be configured to return SRV records.

Diagnose by executing the following command:

nslookup -q=srv _ldap._tcp. ADdomainToJoin.com

  • Make Sure that the Global Catalog Is Accessible

The global catalog for Active Directory must be accessible. Diagnose by executing the following command:

nslookup -q=srv _ldap._tcp.gc._msdcs. ADrootDomain.com

From the list of IP addresses in the results, choose one or more addresses and test whether they are accessible on Port 3268 by using telnet.

  • Verify that the Client Can Connect to the Domain on Port 123

Windows time service must be running on the domain controller.

On a Linux computer, run the following command as root:

ntpdate -d -u DC_hostname

  1. Log-in/Authentication issues
  • Make Sure You Are Joined to the Domain

Check ‘lw-lsa get-status’

  • Clear the Cache

Clear the cache to ensure that the client computer recognizes the user’s ID.

# ad-cache –delete-all

Clear the Likewise Kerberos cache to make sure there is not an issue. Execute the following command at the shell prompt with the user account that you are troubleshooting:

~#kdestroy

  • Check the Status of the Likewise Authentication Daemon

#/etc/init.d/lsassd status

  • Check Communication between the Likewise Daemon and AD

verify that the you can ping DC from ESX host.

  • Make Sure the AD Authentication Provider Is Running

# lw-lsa get-status

If the result will not include the AD authentication provider or will indicate that it is offline restart the authentication daemon

  • Check whether you can log on with SSH by executing the following command:

ssh DOMAIN\\username@localhost

  1. Lsassd crash due to various reasons such as during trust enumeration etc.
  • analyze the lsassd,netlogond,lwiod logs, see where exactly where likewise daemon is crashing.
  • look into the hostd logs and tcpdump to get more info
  1. Kerberos related issues
  • start to look into the packet capture (both sites esxi and ad) to see if we’re getting proper TGT and TGS.

//can be related to Kerberos cache so in this case empty the Kerberos cache using mentioned  ‘kdestory’ command.

  1. Hostd crash in Likewise code
  • Gather full log bundle and engage VMware GSS
  1. Windows AD server related issues
  • Gather guest OS logs and engage MS Support.

Ok., so now we have in one place all troubleshooting options and methodology, now it is time for real life story experience based on one of my last service requests: Customer is unable to log in using Active Directory credentials. It shows invalid credentials even though “Authentication Services” shows that host is joined into domain correct domain. The issue is seen on most of the hosts within the environment. Only 2 hosts do not suffer from the problem – cannot find any difference in configuration. Customer running latest 6.0 build: 4600944

Some other symptoms observed during troubleshooting issue step by step:

  1. Tried to disjoin server outside the domain using vSphere Client GUI on the host connected to vCenter – host stops responding unless we restart hostd. Restarting all management agents hangs on likewise agent for an infinite time.
  2. Unable to stop Active Directory Service – server not responding. After restarting hostd, or entire host – server back to normal operational state
  3. Change Active Directory Service to not start with the host -> restart ESXi – works
  4. Check auth type – now ESXi states that it is Local Authentication (so after all the restarts, finallly ESXi left the domain)
  5. Add host once again to the domain – host stops responding. Restart hostd – works fine
  6. Check auth type – ESXi states that he is joined to domain.
  7. Try to add permissions to the domain users – unable to select domain to assign permissions
  8. From AD perspective – ESXi account is refreshed

Troubleshooting Action Taken

===============

  1. Verify if likewise agents is up and running (It is)
  2. Restart likewise agent on the hosts (no impact on issue)
  3. Add advanced setting UserVars.ActiveDirectoryPreferredDomainControllers as per KB https://kb.vmware.com/kb/2107385 – Didn’t help
  4. To exclude any firewall issues blocking Domain controller traffic: ~# esxcli network firewall unload and retry login with domain account- Didnt help
  5. Increased likewise agent logging to debug and:
  6. a) Re-try domain authentication to see log entries
  7. b) Tried to leave -> rejoin domain using CLI (leave succesful, rejoin causes host to hang again unless we reboot host)
  8. Verify known issues in 6.0 related to authentication with AD – issues resolved in 6.0U1, while customer using latest patch

 

Log Analysis

  1. Trying to stop LWSMD using SSH

[~] /etc/init.d/lwsmd stop

watchdog-lwsmd: Terminating watchdog process with PID 36150 Stopping Likewise Service Manager [failed to release memory reservation ] [failed to release memory reservation ] [failed to release memory reservation ] [failed to release memory reservation ] [failed to release memory reservation ] [failed to release memory reservation ] [failed to release memory reservation ] [failed to release memory reservation ] [failed to release memory reservation ] [failed to release memory reservation ] …failed

 

  1. Retry domain authentication with debug likewise logging (authentication does not succeed):

20161208115138:DEBUG:LwKrb5SetThreadDefaultCachePath():lwkrb5.c:410: Switched gss krb5 credentials path from <null> to FILE:/etc/likewise/lib/krb5cc_lsass.XXX.COM

20161208115138:DEBUG:lsass:MemCacheFindGroupByName():memcache.c:1081: Error code: 40017 (symbol: LW_ERROR_NOT_HANDLED)

20161208115138:DEBUG:lsass:LsaSrvFindProviderByName():state.c:128: Error code: 40040 (symbol: LW_ERROR_INVALID_AUTH_PROVIDER)

20161208115138:DEBUG:lsass:LsaSrvProviderServicesDomain():provider.c:151: Error code: 40040 (symbol: LW_ERROR_INVALID_AUTH_PROVIDER)

20161208115138:VERBOSE:lsass:LsaAdBatchMarshal():batch_marshal.c:525: Did not find object by NT4 name ‘ESX Admins’

20161208115138:DEBUG:lsass:LsaAdBatchFindSingleObject():batch.c:1388: Error code: 40071 (symbol: LW_ERROR_NO_SUCH_OBJECT)

20161208115138:DEBUG:lsass:AD_FindObjectByNameTypeNoCache():online.c:3519: Error code: 40071 (symbol: LW_ERROR_NO_SUCH_OBJECT)

20161208115138:DEBUG:lsass:AD_OnlineFindObjectByName():online.c:4129: Error code: 40012 (symbol: LW_ERROR_NO_SUCH_GROUP)

20161208115138:DEBUG:lsass:LsaSrvFindGroupAndExpandedMembers():api2.c:1626: Error code: 40012 (symbol: LW_ERROR_NO_SUCH_GROUP)

20161208115338:VERBOSE:lsass:LsaSrvIpcCheckPermissions():ipc_state.c:79: Permission granted for (uid = 0, gid = 0, pid = 169257) to open LsaIpcServer

20161208115338:VERBOSE:lsass-ipc:lwmsg_peer_log_accept():peer-task.c:271: (session:f09bcf7743520e1d-b414124c53159168) Accepted association 0x1f0e5be8

20161208115338:DEBUG:LwKrb5SetThreadDefaultCachePath():lwkrb5.c:410: Switched gss krb5 credentials path from <null> to FILE:/etc/likewise/lib/krb5cc_lsass. 1. Trying to stop LWSMD using SSH

[root@plpa2ex19irvm:~] /etc/init.d/lwsmd stop

watchdog-lwsmd: Terminating watchdog process with PID 36150 Stopping Likewise Service Manager [failed to release memory reservation ] [failed to release memory reservation ] [failed to release memory reservation ] [failed to release memory reservation ] [failed to release memory reservation ] [failed to release memory reservation ] [failed to release memory reservation ] [failed to release memory reservation ] [failed to release memory reservation ] [failed to release memory reservation ] …failed

 

  1. Retry domain authentication with debug likewise logging (authentication does not succeed):

20161208115138:DEBUG:LwKrb5SetThreadDefaultCachePath():lwkrb5.c:410: Switched gss krb5 credentials path from <null> to FILE:/etc/likewise/lib/krb5cc_lsass.XXX.COM

20161208115138:DEBUG:lsass:MemCacheFindGroupByName():memcache.c:1081: Error code: 40017 (symbol: LW_ERROR_NOT_HANDLED)

20161208115138:DEBUG:lsass:LsaSrvFindProviderByName():state.c:128: Error code: 40040 (symbol: LW_ERROR_INVALID_AUTH_PROVIDER)

20161208115138:DEBUG:lsass:LsaSrvProviderServicesDomain():provider.c:151: Error code: 40040 (symbol: LW_ERROR_INVALID_AUTH_PROVIDER)

20161208115138:VERBOSE:lsass:LsaAdBatchMarshal():batch_marshal.c:525: Did not find object by NT4 name ‘ESX Admins’

20161208115138:DEBUG:lsass:LsaAdBatchFindSingleObject():batch.c:1388: Error code: 40071 (symbol: LW_ERROR_NO_SUCH_OBJECT)

20161208115138:DEBUG:lsass:AD_FindObjectByNameTypeNoCache():online.c:3519: Error code: 40071 (symbol: LW_ERROR_NO_SUCH_OBJECT)

20161208115138:DEBUG:lsass:AD_OnlineFindObjectByName():online.c:4129: Error code: 40012 (symbol: LW_ERROR_NO_SUCH_GROUP)

20161208115138:DEBUG:lsass:LsaSrvFindGroupAndExpandedMembers():api2.c:1626: Error code: 40012 (symbol: LW_ERROR_NO_SUCH_GROUP)

20161208115338:VERBOSE:lsass:LsaSrvIpcCheckPermissions():ipc_state.c:79: Permission granted for (uid = 0, gid = 0, pid = 169257) to open LsaIpcServer

20161208115338:VERBOSE:lsass-ipc:lwmsg_peer_log_accept():peer-task.c:271: (session:f09bcf7743520e1d-b414124c53159168) Accepted association 0x1f0e5be8

20161208115338:DEBUG:LwKrb5SetThreadDefaultCachePath():lwkrb5.c:410: Switched gss krb5 credentials path from <null> to FILE:/etc/likewise/lib/krb5cc_lsass.XXX.COM

20161208115338:INFO:netlogon:LWNetSrvGetDCName():dcinfo.c:97: Looking for a DC in domain ‘XXX’, site ‘<null>’ with flags 100

20161208115338:INFO:netlogon:LWNetSrvGetDCName():dcinfo.c:97: Looking for a DC in domain ‘XXX.com’, site ‘<null>’ with flags 100

20161208115338:INFO:netlogon:LWNetSrvGetDCName():dcinfo.c:97: Looking for a DC in domain ‘XXX.com’, site ‘<null>’ with flags 140

20161208115338:DEBUG:netlogon:LWNetCacheDbQuery():lwnet-cachedb.c:1079: Cached entry not found: XXX.com, , 1

20161208115338:DEBUG:netlogon:LWNetSrvGetDCName():dcinfo.c:128: Error at ../netlogon/server/api/dcinfo.c:128 [code: 1355]

20161208115338:DEBUG:netlogon:LWNetTransactGetDCName():ipc_client.c:249: Error at ../netlogon/client/ipc_client.c:249 [code: 1355]

20161208115338:DEBUG:netlogon:LWNetGetDCNameExt():dcinfo.c:133: Error at ../netlogon/client/dcinfo.c:133 [code: 1355]

 

  1. Try to rejoin domain (which causes host to hang in the end):

20161214123838:VERBOSE:lsass:LsaSrvIpcCheckPermissions():ipc_state.c:79: Permission granted for (uid = 0, gid = 0, pid = 39070) to open LsaIpcServer

20161214123838:VERBOSE:lsass-ipc:lwmsg_peer_log_accept():peer-task.c:271: (session:6b1bb0e33d95252a-e893c9a774c67d8e) Accepted association 0x1f07fe00

20161214123838:VERBOSE:lwreg:RegDbOpenKey():sqldb.c:1032: Registry::sqldb.c RegDbOpenKey() finished

20161214123838:DEBUG:lwreg:RegDbGetKeyValue_inlock():sqldb_p.c:1227: Error at ../lwreg/server/providers/sqlite/sqldb_p.c:1227 [status: LW_STATUS_OBJECT_NAME_NOT_FOUND = 0xC0000034 (-1073741772)]

20161214123838:DEBUG:lwreg:RegDbGetValueAttributes_inlock():sqldb_schema.c:846: Error at ../lwreg/server/providers/sqlite/sqldb_schema.c:846 [status: LW_STATUS_OBJECT_NAME_NOT_FOUND = 0xC0000034 (-1073741772)]

20161214123838:VERBOSE:lwreg:SqliteGetValueAttributes_Internal():regschema.c:360: Registry::sqldb.c SqliteGetValueAttributes_Internal() finished

20161214123838:DEBUG:lwreg:SqliteGetValue():sqliteapi.c:887: Error at ../lwreg/server/providers/sqlite/sqliteapi.c:887 [status: LW_STATUS_OBJECT_NAME_NOT_FOUND = 0xC0000034 (-1073741772)]

20161214123838:DEBUG:lwreg:RegTransactGetValueW():clientipc.c:810: Error at ../lwreg/client/clientipc.c:810 [status: LW_STATUS_OBJECT_NAME_NOT_FOUND = 0xC0000034 (-1073741772)]

20161214123838:DEBUG:lwreg:LwNtRegGetValueA():regntclient.c:801: Error at ../lwreg/client/regntclient.c:801 [status: LW_STATUS_OBJECT_NAME_NOT_FOUND = 0xC0000034 (-1073741772)]

20161214123838:DEBUG:lwreg:RegShellUtilGetValue():rsutils.c:1463: Error at ../lwreg/shellutil/rsutils.c:1463 [code: 40700]

20161214123838:DEBUG:LwpsLegacyGetDefaultJoinedDomain():lsapstore-backend-legacy-internal.c:711: -> 0 (ERROR_SUCCESS) (EE = 685)

20161214123838:DEBUG:LsaPstoreGetPasswordInfoW():lsapstore-main.c:109: -> 2692 (NERR_SetupNotJoined) (EE = 80)

20161214123838:DEBUG:LsaPstoreGetPasswordInfoA():lsapstore-main-a.c:89: -> 2692 (NERR_SetupNotJoined) (EE = 71)

20161214123838:DEBUG:lsass:AD_GetMachineAccountInfoA():machinepwdinfo.c:91: Error code: 2692 (symbol: NERR_SetupNotJoined)

20161214123838:DEBUG:lsass:AD_IoctlGetMachineAccount():ioctl.c:102: Error code: 2692 (symbol: NERR_SetupNotJoined)

20161214123838:DEBUG:lsass:AD_ProviderIoControl():provider-main.c:4377: Error code: 2692 (symbol: NERR_SetupNotJoined)

20161214123838:DEBUG:lsass:LsaSrvProviderIoControl():provider.c:99: Error code: 2692 (symbol: NERR_SetupNotJoined)

20161208115338:INFO:netlogon:LWNetSrvGetDCName():dcinfo.c:97: Looking for a DC in domain ‘XXX.com’, site ‘<null>’ with flags 140

20161208115338:DEBUG:netlogon:LWNetCacheDbQuery():lwnet-cachedb.c:1079: Cached entry not found: XXX.com, , 1

20161208115338:DEBUG:netlogon:LWNetSrvGetDCName():dcinfo.c:128: Error at ../netlogon/server/api/dcinfo.c:128 [code: 1355]

20161208115338:DEBUG:netlogon:LWNetTransactGetDCName():ipc_client.c:249: Error at ../netlogon/client/ipc_client.c:249 [code: 1355]

At this stage we decide to gather network packets and analyze communication between esxi nad DC, time show that this was a good direction:

//packet capture methodology

  • eneble likewise loging:

/etc/init.d/lwsmd start

/usr/lib/vmware/likewise/bin/lwsm set-log-level trace /usr/lib/vmware/likewise/bin/lwsm set-log file /var/log/likewise.log tail -f /var/log/likewise.log

  • start tcp dump

tcpdump-uw -i 1 -n -s0 not tcp port 22 -C 50M -W 5 -w /var/log/capture10.pcap -vvv

 

  • add ESXi to domain from cli to capture comunication flow:

/usr/lib/vmware/likewise/bin/domainjoin-cli –loglevel verbose –logfile

join xxx.com plp24308
esxi and likewise2

We foud that on problematic ESXi hosts IPv6 communication was disabled but DC still using IPv6 in communication after couple test we confirm that after enabling IPv6 on ESXi or totally disabling it at   DC site:

https://support.microsoft.com/en-us/help/929852/how-to-disable-ipv6-or-its-components-in-windows

finally, there is no error with adding a host to the domain and DC authentication.

To clear more this whole situation we decided to perform additional investigation with VMware Support. GSS confirmed that they located the issue:

“…with the newer versions (vSphere 6) of ESXi in case it receives kdc in IPv6 format. In that situation the host will try to connect with IPv6. In case host has IPv6 disabled it will fail to join the domain “

//Bug is planned to be fixed on vSphere6.5U1

ESXi and Likewise – troubleshooting guide – part 1

ESXi and Likewise – troubleshooting guide – part 1

Last week I had to troubleshoot strange issue related to Active Directory integration with ESXI (6.0 version), this was motivation to prepare small (two articles) series about ESXi / Likewise integration and troubleshooting based on my latest experience.

VMware use Powerbroker Identity Services (Formerly known as Likewise) for adding ESX host to windows AD environment. To begin as usual is good to have some theoretical background about related components and technology and we describe it all in this part.

Below some of the basics of PAM and Kerberos:

PAM (Pluggable authentication module) – It’s a mechanism to integrate multiple low-level application schemes into a high-level APIs. All the application programs like hostd, dcui etc use PAM for creating users and authenticating them. They are referred to system-auth file which in turn refers to /etc/security/login.map. login.map maps ‘vpxa’ user to system-auth-local and all other users are mapped to system-auth-generic. PAM on its own can’t implement Kerberos ; It’s not possible for a PAM module to request a Kerberos service ticket (TGS) from a Kerberos key distribution center (KDC).
Kerberos – Kerberos protocol is designed to provide reliable authentication over open and insecure networks where communications between the hosts belonging to it may be intercepted. So we can say Kerberos is an authentication protocol for trusted hosts on untrusted networks.

esxi and likewise1

If you interested in more deep knowledge on this topic, take a look at this Kerberos tutorial: http://www.kerberos.org/software/tutorial.html

Before we describe communication with ESXi lets gathers together all Likewise components:
• Lsassd – The Likewise authentication daemon handles authentication, authorization, caching, and idmap lookups,
• Netlogond – Detects the optimal domain controller and global catalog and caches the data,
• Lwiod – The Likewise input-output service. It communicates over SMB with SMB servers,
• Caches – To maintain the current state and to improve performance, the Likewise agent caches information in several files, all of which are in /etc/likewise/db/
• lsass-adcache.filedb – Cache managed by the AD authentication provider,
• netlogon-cache.filedb – Domain controller affinity cache, managed by netlogond,
• pstore.filedb – Repository storing the join state and machine password.

OK, now it’s time to consider how Likewise extends Kerberos authentication to ESXi:

1. User logs in to ESX (c# or web client),
2. Username and password are sent to PAM,
3. pam_lsass.so library communicates with the Lsassd,
4. from username and password Lsassd generates a secret key,
5. using the secret key Lsassd request a TGT, from AD’s KDC,
6. The KDC verifies the secret key and then grants the ESXi Host a TGT,
7. ESXi host and the KDC exchange messages to authenticate the client,
8. Lsassd can use the TGT request service tickets for other services such as ssh.

To clarify more lets discuss important algoritms (netlogon) related to this process to address common questions:

1. How Netlogond finds the best DC (prioritization) ?

Likewise Netlogon obtains a list of candidate Domain Controllers using DNS. The algorithm for doing this is based on the algorithm used in Windows Netlogon. Each candidate Domain Controller which matches the site criteria is queried with a CLDAP request for the Netlogon attribute. The time to respond for each Domain Controller is stored as PingTime in the DomainControllerInfo output parameter. The Domain Controller with the lowest PingTime is returned to the caller.

2. Prefered DC

Netlogon attempts to find the domain controller which responds the quickest to CLDAP pings with a preference for domain controllers in the same site. The algorithm is rather complex 😉
If the request includes a site, then the query order is:

a) Preferred domain controller plugin with the requested site
b) DNS with the requested site

3. Domain Join Process
Domain join involves various steps and communication among various domain:
a) creating computer account in DC
b) creating machine account and setting password
c) saving machine account/password to pstore db and updating kerberos keytab

Last part of this article will discuss important packets in tcpdump – very important in case of troubleshooting problems with joining ESXi to domain :
1. CLDAP – Usage of CLDAP packets depends upon the attribute:

If attribute = netlogon -> These CLDAP pings are used by netlogond to verify the aliveness of the domain controller and also check whether the domain controller matches a specific set of requirements. netlogon version(NtVer) etc.
If attribute = time -> These’re used for selecting the nearest DC by netlogon during domain controller discovery phase.
2. ARP – Address resolution protocol is used for resolution of network layer address into link layer address i.e. IP address to MAC address. These’re common packets not specific to AD.

3. DNS – DNS queries related to srv records. Microsoft decided to use SRV records as a key part of the procedure whereby a client finds a domain controller. So where do these records come from? They are registered with DNS by the NetLogon service of a domain controller when it starts. There are actually quite a few of these records, but for right now let’s just look at two of them, the ones that have to do with domain controllers. They are in the following formats:
_ldap._tcp.dc._msdcs.dnsdomainname _ldap._tcp.sitename._sites.dc._msdcs.dnsdomainname
4. KRB5 : For Krb5 you would see four packets viz. AS-REQ, AS-REP, TGS-REQ, TGS-REP:
• AS_REQ is the initial user authentication request (i.e. made with kinit) This message is directed to the KDC component known as Authentication Server (AS);
• AS_REP is the reply of the Authentication Server to the previous request. Basically it contains the TGT (encrypted using the TGS secret key) and the session key (encrypted using the secret key of the requesting user);
• TGS_REQ is the request from the client to the Ticket Granting Server (TGS) for a service ticket. This packet includes the TGT obtained from the previous message and an authenticator generated by the client and encrypted with the session key;
• TGS_REP is the reply of the Ticket Granting Server to the previous request. Located inside is the requested service ticket (encrypted with the secret key of the service) and a service session key generated by TGS and encrypted using the previous session key generated by the AS;

5. LDAP : A standards-based protocol that is used for communication between directory clients and a directory service. LDAP is the primary directory access protocol for Active Directory. LDAP searches are the most common LDAP operations that are performed against an Active Directory domain controller. An LDAP search retrieves information about all objects within a specific scope that have certain characteristics, for example, the telephone number of every person in a department.
6. NBNS – NBNS serves much the same purpose as DNS does: translate human-readable names to IP addresses

7. RARP – Reverse Address Resolution Protocol i.e. converts MAC address to IP

8. SSH – Secure shell packets

9. TCP- Network traffic who uses TCP as transmission protocol viz. Kerberos,ssh,https,ldap etc.
Processes transmit data by calling on the TCP and passing buffers of data as arguments. The TCP packages the data from these buffers into segments and calls on the internet module [e.g. IP] to transmit each segment to the destination TCP.

10. SMB – Server Message Block,also known as Common Internet File Systems(CIFS) operates as application layer network protocol mainly used for providing shared access to files,printers,serial port, and miscellaneous communications between nodes on a network. It also provides an authenticated inter-process communication mechanism.
Now we are prepared for troubleshooting ! Stay tuned for next part 🙂

VMware vExperts 2017 awards

VMware vExperts 2017 awards

VMware vExpert is a non technical award granted by a special committee every year. It is a special award because there is no exam or anything like that. VMware judges give it for individuals who have demonstrated significant contributions to the community and share their expertise with others. This means different kinds of activities like blogging, sharing the knowledge by other channels or social media as well as  public speakers, book authors, CloudCred task writing, script writers, VMUG leaders and VMTN community moderators.
VMW-LOGO-vEXPERT-2017-k

In this year we are very pleased to announce that both of us (Paweł Piotrowski and Daniel Okrasa) were awarded with this mysterious title for the first time. The whole list of vExpert 2017 can be found here.

We would like to thanks to all of our readers and your feedback. We promise to develop our blog with more interesting articles about our experiences and observations mostly in relation to VMware products 🙂

VMware vSphere tags limit – is it known ?

VMware vSphere tags limit – is it known ?

Recently I received quite interesting question – what is the supported maximum quantity  for tags in vCenter 6.0U2 ?

Malignant author of the question is a good friend of mine and VMware administrator in one person. He ssked about tags limit because he want to use them to provide more information about each of its production VM’s – roughly speaking need to create about 20000 tags.

I thought ok., give me couple seconds to verify this,  and looked fast in vmware configuration maxims …. couple minuntes later it was clear that this is not a easy question 😉

Furthermore after some additional research (no clear statement in official documentation)  we decide to perform tests in lab environment !

We used simple powercli script to create 20000 tags in test vcenter appliance (6.0U2) , below our script:

for($i=1
$i -le 20000
$i++){
New-Tag -Name $i -Description $i -Category test
}

Script worked like a charm without any issue – so far so good :), but when we tried to assign one tag to first vm we encounter web client error 1009  – very strange!

We decided to perform additional test and find out that limit is below 10000.  At this stage we decide to clear this issue with Vmware support and after some time received wery interesting feedback:

  1. NGC has upper bound of retrieve 10000 objects max.
  2. If the tags are less than 10000 then data service timeouts after 120 seconds(default dataservice timeout is 120 seconds).
  3. Decreasing the count to 9994 tags and increasing dataservice timeout, shows up all the tags(Assign) now.

As a temporary workaround for now.
————————————-
1. Have total created tags less than 10000.
2. Increase data service timeout to 600 seconds(10 min).

VMware GSS stands that engineering working now to remove tag limit boundary in next releases vSphere 6.x.

vSphere 6.5 – Update Manager changes

vSphere 6.5 – Update Manager changes

Going through our list of articles about new features in vSphere 6.5 the last one is vSphere Update Manager for vCenter Server Appliance. Since vSphere 6.5 it’s fully embedded and integrated with vCenter Server Appliance with no Windows dependencies. It means that vCenter Server Appliance delivers now Update Manager as an optional service similar to Auto Deploy, etc.

Since vSphere 6.5 there is no longer possible to connect Update Manager instance that is installed on a Windows Server machine with vCenter Appliance.

That’s mean that you have two ways to use Update Manager component:
• You can install the Update Manager server component either on the same Windows server where the vCenter Server is installed or on a separate machine. To install Update Manager, you must have Windows administrator credentials for the computer on which you install Update Manager.
• You can deploy vSphere Update Manager in a secured network without Internet access. In such a case, you can use the vSphere Update Manager download service to download update metadata and update binaries.

In the a facelifted Web Client Update Manager Web Client appears as an Update Manager tab under the Configure tab in vSphere Web Client.

The whole management processes are rather the same so there isn’t anything special and worth to notice here since the product is pretty simple, easy on of course it’s doing the job.