My team is currently in the process of purging millions of historical anonymous xDB contact records and associated data using the ADM module for a client whose xDB shard database sizes have been approaching max storage capacity for the Azure tier. Because xDB is a crucial portion of the client site's operations, our options for reducing the DB size have been somewhat limiting due to complex custom external integrations with xDB.
In our approach, we opted to use ADM to purge historical anonymous contacts in batches. We prepare ~300k contact records per shard for each batch, which are manually retrieved via SQL query. Once we've created the temporary table in the shard DB, we prepare the data by generating a comma-delimited list of contacts and then kick off the purge process via ADM.
When ADM populates its Tasks table, each queued record is subsequently processed by ADM and removed from the Tasks table as it completes processing that record. The ADM task execution is a generally slow process (1 contact processed every 2-3 seconds); we closely monitor the progress with a SQL query:
With this approach (in addition to SHRINK and REINDEX operations between batches), we have seen the necessary disk size reduction of both xDB shard DBs after running a cadence of several batches.
"[ADM] Response from xConnect did not indicate success. Status code: BadRequest, Message: {\"Message\":\"The remove task can't be started while another one is running.\"}" |
Upon initial analysis, we noted that the ADM tasks table was still populated with IDs that had yet to be processed when the operation was cut off. I began dissecting the ADM binary files for clues - specifically in search of the message "The remove task can't be started while another one is running".
The RemoveDataSettings record's value appeared to be a JSON representation of ADM's last ADM removal task run. However, the JSON representation was cut off after a few hundred characters. With this state of the present value, ADM was convinced that the task was not completed.
Following the approach used in the code (mimicking what should occur when an ADM removal task is completed), we ran the following command:
We also entirely cleared the remaining IDs and Tasks table and re-initialized the process. With these steps, our ADM tasks were back to processing as expected.
I hope this one helps anyone in a similar situation!
0 comments:
Post a Comment