Thursday, December 26, 2019

Unable to add permissions or Unable to add any identity source in vCenter 6.x


ssoAdminserver logs

[INFO ][2019-12-19T14:50:25.432Z][k4cu9c27-323-auto-91-h5:70000118] auditlogger - {"user":"Administrator@VSPHERE.LOCAL","client":"","timestamp":"12/19/2019 14:50:25 UTC","description":"Registering the Active Directory as identity source
with domain Name 'LAB.LOCAL'","eventSeverity":"INFO","type":"com.vmware.sso.IdentitySourceManagement"}
[INFO ][2019-12-19T14:50:25.432Z][k4cu9c27-323-auto-91-h5:70000118] IdentitySourceManagementServiceImpl - [User {Name: Administrator, Domain: VSPHERE.LOCAL} with role 'Administrator'] Registering the Active Directory as identity source w
ith domain Name 'LAB.LOCAL'
[INFO ][2019-12-19T14:50:25.488Z][k4cu9c27-323-auto-91-h5:70000118] PooledLdapConnectionFactory - New connection created in pool PooledLdapConnectionIdentity [tenantName=null, username=vcsalab.org@vsphere.local, authType=SRP, us
eGCPort=false, connectionString=ldap://localhost:389]
[WARN ][2019-12-19T14:50:25.551Z][k4cu9c27-323-auto-91-h5:70000118] LdapErrorChecker - Error received by LDAP client: com.vmware.identity.interop.ldap.OpenLdapClientLibrary, error code: 1
[ERROR][2019-12-19T14:50:25.556Z][k4cu9c27-323-auto-91-h5:70000118] IdentityManager - Failed to add identity provider for tenant [vsphere.local]
[ERROR][2019-12-19T14:50:25.556Z][k4cu9c27-323-auto-91-h5:70000118] ServerUtils - Exception 'com.vmware.identity.interop.ldap.OperationsErrorLdapException: Operations error LDAP error [code: 1]'


vmdird syslog

2019-12-18T10:59:17.368888+00:00 err vmdird  t@140252710672128: UpdateServerObject: InternalModifyEntry failed. Error code: 1, Error string: Schema check failed - (9612)(Objectclass (vmwDirServer) is not defined in schema)
2019-12-18T10:59:48.033055+00:00 err vmdird  t@140252710672128: UpdateServerObject: InternalModifyEntry failed. Error code: 1, Error string: Schema check failed - (9612)(Objectclass (vmwDirServer) is not defined in schema)
019-12-19T15:09:08.088685+00:00 err vmdird  t@140036292978432: CoreLogicModifyEntry failed, DN = CN=81FD31A929956E9A1AEC546701B114C6EC48E74A,CN=Certificate-Authorities,cn=Configuration,dc=vsphere,dc=local, (9612)(Schema check failed - (9612)(Objectclass (vmwCertificationAuthority) is not defined in schema))
2019-12-19T15:09:08.089737+00:00 err vmdird  t@140036292978432: VmDirSendLdapResult: Request (Modify), Error (1), Message (Schema check failed - (9612)(Objectclass (vmwCertificationAuthority) is not defined in schema)), (0) socket (127.0.0.1)
(END)


Resolution :


Appliance Based Platform Services Controller:


  • Ensure to take a snapshot or Backup of the VCSA and PSC 
  • Connect to the Platform Services Controller with an SSH session as root.
  • Stop the Platform Services Controller services 
  • Run this command to update the VMdir Schema:
  • /usr/lib/vmware-vmdir/sbin/vmdird -c -u -f /usr/lib/vmware-vmdir/share/config/vmdirschema.ldif
  • Start the Platform Services Controller services 
  • Re-add the identity source 

Windows Based Platform Services Controller:


  • Log in to the Platform Services Controller machine as an Administrator.
  • Open an administrative command prompt.
  • Change to the Platform Services Controller installation directory:
  • cd C:\Program Files\VMware\vCenter Server\bin
  • Note: This is the default installation path. If you have installed the Platform Services controller to another location, modify this command to reflect the correct install location. 
  •  stop all services: 
  • Run this command to update the VMdir Schema:
  • C:\Program Files\VMware\vCenter Server\vmdird\vmdird.exe -c -u -f C:\ProgramData\VMware\vCenterServer\cfg\vmdird\vmdirschema.ldif
  • Note: This command uses the default installation path. If you have installed the Platform Services controller to another location, modify this command to reflect the correct install location. 
  • start all services and re-add the identity source 


Below is a different KB but you can use it as a reference to update the schema.ldif file 

https://kb.vmware.com/s/article/2144612      -à Deploying or Installing an additional Platform Service Controller 6.0 Update 1b fails during vmafd firstboot (2144612)

Friday, December 13, 2019

missing chunk number 0 for toast value 120710098 in pg_toast_17620; -Postgres corruption

vCenter 6.x with postgres DB corruption making vpxd to crash



vpxd logs

2019-04-20T16:26:10.995-06:00 error vpxd[11244] [Originator@6876 sub=Default] [VdbStatement] SQLError was thrown: "ODBC error: (XX000) - ERROR: missing chunk number 0 for toast value 120710098 in pg_toast_17620;
--> Error while executing the query" is returned when executing SQL statement "SELECT ID, CONFIG_MANAGER FROM VPX_HOST"
2019-04-20T16:26:10.995-06:00 error vpxd[11244] [Originator@6876 sub=DbBulkLoader] [VpxdDbBulkLoader::Load] Failed to load tableDef 19 from database: "ODBC error: (XX000) -
--> Error while executing the query" is returned when executing SQL statement "SELECT ID, CONFIG_MANAGER FROM VPX_HOST"
2019-04-20T16:26:10.995-06:00 error vpxd[11244] [Originator@6876 sub=Default] Win32 invalid_parameter: expression=(null), function=(null), file=(null), line=0
2019-04-20T16:26:11.029-06:00 info vpxd[11244] [Originator@6876 sub=Default] CoreDump: Writing minidump
2019-04-20T16:26:13.940-06:00 panic vpxd[11244] [Originator@6876 sub=Default]
-->
--> Panic: Win32 invalid_parameter error
--> Backtrace:
-->

Postgres logs

MDT 5cbb8054.1bc8 0 VCDB vc ERROR:  missing chunk number 0 for toast value 177603961 in pg_toast_17620


In most cases we end up redeploying the vCenter or restoring the vCenter from a good known backup for the error missing chunk number 0 for toast value 120710098 in pg_toast_17620;

 In my case it was the ESXi host Entity was corrupted . steps.


MDT 5cbb8054.1bc8 0 VCDB vc ERROR:  missing chunk number 0 for toast value 177603961 in pg_toast_17620

The log will indicated which toast value is corrupt

  1. Login to the database.

              psql -U vc -d VCDB

  1. To identify the table with the corrupt rows.

               VCDB=> select 17620::regclass;
               regclass
               ----------
              vpx_host        In this case its host The corruption is on vpx_host
               (1 row)

  1. To find the row do some selects.

             VCDB=> select count(*) from vpx_host;
               count
              -------
               40
  1. List down the ESXi host in the DB

              VCDB=>    select id,dns_name from vpx_host;
                 id   |           dns_name
               --------+------------------------------
              26780 | vhusinvctgvm12.managed.local
              141 | vhusinvctgvm08.managed.local
              626 | vhusinvctgvm03.managed.local
              18603 | mnusinvctgvm06.managed.local
             1188 | vhusinvctgvm04.managed.local
              536 | vhusinvctgvm02.managed.local
             2229 | vhusinvctgvm06.managed.local
            240231 | vhusinvctgvm09.managed.local
             277 | vhusinvctgvm10.managed.local
             6554 | mnusinvctgvm10.managed.local
             6833 | mnusinvctgvm12.managed.local
            242255 | vhusinvctgvm16.managed.local
             12140 | gr-vm02.kmhd.local
            148668 | vhusinvsvsvm02.managed.local
            242209 | vhusinvctgvm13.managed.local
             479 | vhusinvctgvm01.managed.local
             242269 | vhusinvctgvm17.managed.local               
            (40 rows)

  1.  VCDB=> select * from vpx_host where id<242269;      ---  This is the corrupt host
                ERROR:  missing chunk number 0 for toast value 177603961 in pg_toast_17620

  1. Since we have the hostd id tied to multiple entities disconnecting the host will not help us

  1. We need to Manually removing an ESX\ESXi host from the vCenter Server database

PNID change issue on 6.7 Update 3


As we know PNID change is supported on 6.7 U3 onwards, however while changing the hostname from VAMI, you may encounter errors “Failed to create replication placeholder” and “Network update failed” :


To fix the issue, we need to replace the pnid_utils.py file which is at /usr/lib/applmgmt/networking/py/vmware/appliance/networking/pnid/ location with attached one, restart Appliance mgmt. service and try changing PNID again.

Quick Esxi commands


 Esxi’s command’s to make your daily troubleshooting 


- Lists all vm's running on hypervisor and provides vmid
   
    vim-cmd vmsvc/getallvms
   
List the inventory ID of the virtual machine with the command:

     vim-cmd vmsvc/getallvms |grep <vm name>

    Note: The first column of the output shows the vmid.

Check the power state of the virtual machine with the command:

      vim-cmd vmsvc/power.getstate <vmid>

Power-on the virtual machine with the command

      vim-cmd vmsvc/power.on <vmid>

Power-off the virtual machine with the command:

      vim-cmd vmsvc/power.off <vmid>

Reboots vmid referenced from getallvms command
   
      vim-cmd vmsvc/power.reboot vmid
    
Power Off (Hard)

get the world ID of the virtual machine

esxcli vm process list

TestComputer

World ID: 1625788
Process ID: 0
VMX Cartel ID: 1625786
UUID: 56 4d 9e d3 8b ce ab 59-9b 22 ac 87 40 6c 48 c3
Display Name: TestComputer

And kill them

esxcli vm process kill -t [soft,hard,force] -w

esxcli vm process kill -t hard -w 1625788

Suspend a vm

vim-cmd vmsvc/power.suspend

Resume a virtual machine

vim-cmd vmsvc/power.suspendResume

Reset a virtual machine

vim-cmd vmsvc/power.reset

Shutdown

vim-cmd vmsvc/power.shutdown

Deletes the vmdk and vmx files from disk
   
      vim-cmd vmsvc/destroy vmid
 
Puts hypervisor into maintenance mode
   
     vim-cmd hostsvc/maintenance_mode_enter

Takes hypervisor out of maintenance mode

     vim-cmd hostsvc/maintenance_mode_exit

Registers vm in hypervisor inventory

     vim-cmd solo/registervm /vmfs/vol/datastore/dir/vm.vmx

Unregisters vm with hypervisor

     vim-cmd vmsvc/unregister vmid

Starts vmware tools installation for VM

     vim-cmd vmsvc/tools.install vmid

Provides information about hypervisor networking

     vim-cmd hostsvc/net/info

Shows daemons running on hypervisor. Can also be used for configuration.
   
     chkconfig -l

Same as linux top for vmware
   
     Esxtop

List of vmkernel errors
   
     vmkerrcode -l

Lists a LOT of information about the esx host

     esxcfg-info

Lists information about NIC's. Can also be used for configuration.

     esxcfg-nics -l

Lists information about virtual switching. Can also be used for configuration.
     esxcfg-vswitch -l

Provides console screen to ssh session
     dcui

Vmware interactive shell
     vsish

Read System Event Log of server
     decodeSel /var/log/ipmi_sel.raw

     esxcfg-vmknic -l
     esxcfg-route -l (for defaultgateway:)
     esxcfg-vswitch -l (for switch)

Restart Management, HA Services
/sbin/services restart

Installing Software. List/Install/Uninstall  VIBs (vSphere Installation bundle)

List vibs

esxcli software vib list

Install a vib

esxcli software vib install -v file:/tmp/[NewVIB].vib

Uninstall a vib (determine the Name of the VIB by the list command)

esxcli software vib remove -n VIBname

Install a patch

esxcli software vib install /tmp/[patchName].zip

Network

firewall state

esxcli network firewall get

Firewall rules

esxcli network firewall ruleset list

Firewall activate a Ruleset, i.e. sshClient

esxcli network firewall ruleset set --ruleset-id=sshClient --enabled=true

Only allow an IP Range for the ssh daemon

esxcli network firewall ruleset allowedip add --ruleset-id sshServer --ip-address 192.168.254.0/24

List Kernel Network Interfaces

esxcli network ip interface list

List routing table

esxcli network ip route ipv4 list

Add a route

esxcli network ip route ipv4 add --gateway 10.1.1.254 --network 10.1.2.0/24

To make the route persistent add the command line to a ESXi startup script. In ESXi 5.1

/etc/rc.local.d/local.sh

in ESXi <=5.0

/etc/rc.local

SNMP

Set Community (Communityname: MGMCOM)

esxcli system snmp set --communities MGMCOM

Set Trap destintion (IP Address Management station: 192.168.254.100)

esxcli system snmp set --targets 192.168.254.100/MGMCOM

Send test trap

esxcli system snmp test

State

esxcli system snmp get

enable IPMI as SNMP source

esxcli system snmp set --hwsrc sensors

enable CIM as SNMP source

esxcli system snmp set --hwsrc indications

Enable snmp

esxcli system snmp set --enable true

Virtual disks

extend a virtual disk, i.e. to 40GB

vmkfstools -X 40G /vmfs/volumes/datastore1/Test/test.vmdk

Convert a Workstation based virtual Disk to a VMFS Disk for use at ESXi. Disk to convert: Computer.vmdk
Rename Disk Computer.vmdk to Computer_WS.vmdk
mv Computer.vmdk Computer_WS.vmdk

Clone the disk
vmkfstools -i Computer_WS.vmdk -d zeroedthick Computer.vmdk
Destination disk format: VMFS thin-provisioned
Cloning disk 'Computer.vmdk'...
Clone: 100% done.

Remove Workstation disk

rm Computer_WS.vmdk

Snapshots

Get VMID

vim-cmd vmsvc/getallvms

Vmid  Name File Guest OS  Version  Annotation
36 .......................................

List all snapshots for a virtual machine

vim-cmd vmsvc/snapshot.get 36

Get Snapshot:
|-ROOT
--Snapshot Name : Installation Complete
--Snapshot Id : 1
--Snapshot Desciption :
--Snapshot Created On : 3/15/2013 13:20:34
--Snapshot State : powered off
--|-CHILD
----Snapshot Name : SNAP1
----Snapshot Id : 2
----Snapshot Desciption :
----Snapshot Created On : 5/14/2013 11:46:19
----Snapshot State : powered on

Create a snapshot,  including the RAM of the Machine

vim-cmd vmsvc/snapshot.create 36 "New Snap" "Snap desc" includeMemory

Delete a snapshot, where 36 is the VMID and 2 the Snapshot ID

vim-cmd vmsvc/snapshot.remove 36 2


ESXi Advanced Kernel Settings/Parameters

Get all

esxcli system settings advanced list

Set one, i.e. disable the shell respectively ssh warnings

esxcli system settings advanced set -o /UserVars/SuppressShellWarning -i 1

ESXi Kernel modules

List loaded kernel modules

vmkload_mod  -l

Get a list of all enabled kernel modules

esxcfg-module -q

Get parameters of a kernel modul, i.e. an Emulex FC HBA module

esxcfg-module -g lpfc820

lpfc820 enabled = 1 options = 'lpfc0_lun_queue_depth=8 lpfc1_lun_queue_depth=8 lpfc2_lun_queue_depth=8 lpfc3_lun_queue_depth=8'

Set kernel module parameters, also i.e. the Emulex FC HBA module

esxcfg-module -s "lpfc0_lun_queue_depth=8 lpfc1_lun_queue_depth=8 lpfc2_lun_queue_depth=8 lpfc3_lun_queue_depth=8" lpfc820

Get info of a module with all possible parameters and a description of each

esxcfg-module -i lpfc820

Storage

Get a list of all storage devices

esxcli storage filesystem list

Rename a datastore

vim-cmd hostsvc/datastore/rename OldName NewName

Get a list of all storage path’s

esxcli storage core path list

Get storage paths for a specific drive

esxcli storage core path list -d naa.600000e00d11000000111a2400000000

Show Storage Array Type and Path selection policies of disk devices

esxcli storage nmp device list

Set Roundrobin(VMW_PSP_RR) path selection policy for a disk device, possible other policies are “Most recently used”:VMW_PSP_MRU or “Fixed”:VMW_PSP_FIXED

esxcli storage nmp device set -P VMW_PSP_RR -d naa.600000e00d11000000111a2400020000

Stat of the visorfs. Here you can find  is a detail description of the visorfs(Hypervisor Filesystem).

vdf

Usage of the visorfs

vdu

Host Services

Enable ESXi Shell

vim-cmd hostsvc/enable_esx_shell

Disable ESXi Shell

vim-cmd hostsvc/disable_esx_shell

Start ESXi Shell

vim-cmd hostsvc/start_esx_shell

Enable the ssh daemon

vim-cmd hostsvc/enable_ssh

Disable ssh daemon
vim-cmd hostsvc/disable_ssh

Start ssh daemon

vim-cmd hostsvc/start_ssh

Enter Maintainance mode

vim-cmd hostsvc/maintenance_mode_enter

vPostgres service fails to start with Fatal error : bogus postmaster.pid

I came across this issue today where vPostgres service was failing on VCSA with error Fatal Error.

# service-control --start vmware-vpostgres
Perform start operation. vmon_profile=None, svc_names=['vmware-vpostgres'], include_coreossvcs=False, include_leafossvcs=False
2019-11-06T21:38:14.283Z   Service vmware-vpostgres state STOPPED
Error executing start on service vmware-vpostgres. Details {
    "resolution": null,
    "detail": [
        {
            "args": [
                "vmware-vpostgres"
            ],
            "id": "install.ciscommon.service.failstart",
            "localized": "An error occurred while starting service 'vmware-vpostgres'",
            "translatable": "An error occurred while starting service '%(0)s'"
        }
    ],
    "componentKey": null,
root@vcenterdr [ /var/log/vmware/vpostgres ]#

/var/log/vmware/vpostgres/serverlog.stderr log content :-
~
~
Starting service process with pid: 61183.
LOG:  skipping missing configuration file "/storage/db/vpostgres/postgresql.conf.repl"
LOG:  skipping missing configuration file "/storage/db/vpostgres/postgresql.conf.repl"
2019-11-06 21:38:14.700 UTC 5dc33d46.eeff 0   FATAL:  bogus data in lock file "postmaster.pid": ""

Location of the postmaster.pid file :-

root@vcenterdr [ /var/log/vmware/vpostgres ]# ls -l /storage/db/vpostgres/postmaster.pid
-rw------- 1 vpostgres users 45 Oct 30 17:20 /storage/db/vpostgres/postmaster.pid

I tried to cat/less the postmaster.pid file and found that it had junk characters.

After moving this pid file, vPosgres service started fine and automatically created fresh pid file.

# mv /storage/db/vpostgres/postmaster.pid /storage/db/vpostgres/postmaster.pid_bkp


Note :- Please do not delete the file, and take proper snapshot/backup before making changes

Unable to run a automated Script to Backup the VCSA .Failed to run a script which makes connection via API calls 


Error:  A server error occurred: 'com.vmware.vapi.std.errors.unauthenticated': Unable to authenticate user (Server error id:
'vapi.security.authentication.invalid'). Check $Error[0].Exception.ServerError for more details.

ERROR:vmware.appliance.vapi.auth:Requested SSO authentication but SSO authentication module is not available


vami.log

2019-12-02T18:48:44.336 [50279]INFO:twisted:"127.0.0.1" - - [02/Dec/2019:10:48:44 +0000] "POST /api HTTP/1.1" 200 2783 "-" "vAPI http client"
2019-12-02T18:50:35.336 [50279]ERROR:vmware.appliance.vapi.auth:Could not parse HOK Token
Traceback (most recent call last):
  File "/usr/lib/applmgmt/vapi/py/vmware/appliance/vapi/auth.py", line 183, in authenticate
    token.validate()
  File "/usr/lib/applmgmt/lib/extensions/py/vmware/appliance/extensions/authentication/authentication_sso.py", line 529, in validate
    signing_chain = self.validate_certificate()
  File "/usr/lib/applmgmt/lib/extensions/py/vmware/appliance/extensions/authentication/authentication_sso.py", line 700, in validate_certificate
    'One or more certificates cannot be verified.')AuthenticationError: One or more certificates cannot be verified.
2019-12-02T18:50:35.336 [50279]INFO:twisted:"127.0.0.1" - - [02/Dec/2019:10:50:35 +0000] "POST /api HTTP/1.1" 200 339 "-" "vAPI http client"

Note: PSC had total of 7 STS certificate chain out of which STS of 2 PSC is valid and the rest of STS was stale 


Resolution :

  • Ensure to backup the PSC and VCSA 
  • Verify if there are any Stale STS certificates of the old PSC are listed 
  • Removed the STS certificate chains of the unused PSC 
  • Restart the services and PSC and the VCSA

We can refer to the below KB if the issue is not with Certificates 


Unable to add vCenter to Usage Meter 3.6 after replacing the vCenter certificate

 Issue: Unable to add the vCenter endpoint to the usage meter due to certificate error Error: There was a problem checking the certificate v...