Friday, December 13, 2019

missing chunk number 0 for toast value 120710098 in pg_toast_17620; -Postgres corruption

vCenter 6.x with postgres DB corruption making vpxd to crash



vpxd logs

2019-04-20T16:26:10.995-06:00 error vpxd[11244] [Originator@6876 sub=Default] [VdbStatement] SQLError was thrown: "ODBC error: (XX000) - ERROR: missing chunk number 0 for toast value 120710098 in pg_toast_17620;
--> Error while executing the query" is returned when executing SQL statement "SELECT ID, CONFIG_MANAGER FROM VPX_HOST"
2019-04-20T16:26:10.995-06:00 error vpxd[11244] [Originator@6876 sub=DbBulkLoader] [VpxdDbBulkLoader::Load] Failed to load tableDef 19 from database: "ODBC error: (XX000) -
--> Error while executing the query" is returned when executing SQL statement "SELECT ID, CONFIG_MANAGER FROM VPX_HOST"
2019-04-20T16:26:10.995-06:00 error vpxd[11244] [Originator@6876 sub=Default] Win32 invalid_parameter: expression=(null), function=(null), file=(null), line=0
2019-04-20T16:26:11.029-06:00 info vpxd[11244] [Originator@6876 sub=Default] CoreDump: Writing minidump
2019-04-20T16:26:13.940-06:00 panic vpxd[11244] [Originator@6876 sub=Default]
-->
--> Panic: Win32 invalid_parameter error
--> Backtrace:
-->

Postgres logs

MDT 5cbb8054.1bc8 0 VCDB vc ERROR:  missing chunk number 0 for toast value 177603961 in pg_toast_17620


In most cases we end up redeploying the vCenter or restoring the vCenter from a good known backup for the error missing chunk number 0 for toast value 120710098 in pg_toast_17620;

 In my case it was the ESXi host Entity was corrupted . steps.


MDT 5cbb8054.1bc8 0 VCDB vc ERROR:  missing chunk number 0 for toast value 177603961 in pg_toast_17620

The log will indicated which toast value is corrupt

  1. Login to the database.

              psql -U vc -d VCDB

  1. To identify the table with the corrupt rows.

               VCDB=> select 17620::regclass;
               regclass
               ----------
              vpx_host        In this case its host The corruption is on vpx_host
               (1 row)

  1. To find the row do some selects.

             VCDB=> select count(*) from vpx_host;
               count
              -------
               40
  1. List down the ESXi host in the DB

              VCDB=>    select id,dns_name from vpx_host;
                 id   |           dns_name
               --------+------------------------------
              26780 | vhusinvctgvm12.managed.local
              141 | vhusinvctgvm08.managed.local
              626 | vhusinvctgvm03.managed.local
              18603 | mnusinvctgvm06.managed.local
             1188 | vhusinvctgvm04.managed.local
              536 | vhusinvctgvm02.managed.local
             2229 | vhusinvctgvm06.managed.local
            240231 | vhusinvctgvm09.managed.local
             277 | vhusinvctgvm10.managed.local
             6554 | mnusinvctgvm10.managed.local
             6833 | mnusinvctgvm12.managed.local
            242255 | vhusinvctgvm16.managed.local
             12140 | gr-vm02.kmhd.local
            148668 | vhusinvsvsvm02.managed.local
            242209 | vhusinvctgvm13.managed.local
             479 | vhusinvctgvm01.managed.local
             242269 | vhusinvctgvm17.managed.local               
            (40 rows)

  1.  VCDB=> select * from vpx_host where id<242269;      ---  This is the corrupt host
                ERROR:  missing chunk number 0 for toast value 177603961 in pg_toast_17620

  1. Since we have the hostd id tied to multiple entities disconnecting the host will not help us

  1. We need to Manually removing an ESX\ESXi host from the vCenter Server database

No comments:

Post a Comment

Replacing vROPS Certificates

Issue:  When using default certificates in vROPS  Due to security requirements it was necessary to replace the default self-signed certifica...