Closed
Description
Symfony version(s) affected: 3.4 // 4.4
Description
I have terrible performance issue when I follow documentation to parse quiet large XML (3Mo) with nested query.
How to reproduce
$crawler = new Crawler();
$crawler->addXmlContent(file_get_contents(__DIR__.'/rfc1767_33000.xml'));
foreach ($crawler->filterXPath('//transaction') as $nodeTransaction) {
$crawlerTransaction = new Crawler($nodeTransaction);
$actionCode = $crawlerTransaction->filterXPath('transaction/documentCommand/documentCommandHeader/@type')->text();
$transactionId = $crawlerTransaction->filterXPath('transaction/transactionIdentification/entityIdentification')->text();
foreach ($crawlerTransaction->filterXPath('//catalogue_item_notification:catalogueItemNotification') as $nodeItem) {
$crawlerItem = new Crawler($nodeItem);
$documentId = $crawlerItem->filterXPath('//catalogueItemNotificationIdentification/entityIdentification')->text();
$recipientGln = $crawlerItem->filterXpath('//catalogue_item_notification:catalogueItemNotification/catalogueItem/dataRecipient')->text();
$highestLevelGtin = $crawlerItem->filterXPath('//catalogue_item_notification:catalogueItemNotification/catalogueItem/tradeItem/gtin')->text();
$targetMarket = $crawlerItem->filterXPath('//catalogue_item_notification:catalogueItemNotification/catalogueItem/tradeItem/targetMarket/targetMarketCountryCode')->text();
$documentStatusCode = strtolower($crawlerItem->filterXPath('//catalogue_item_notification:catalogueItemNotification/documentStatusCode')->text());
foreach ($crawlerItem->filterXPath('catalogue_item_notification:catalogueItemNotification//catalogueItem') as $nodeCatalogueItem) {
$crawlerCatalogueItem = new Crawler($nodeCatalogueItem);
$gtin = $crawlerCatalogueItem->filterXPath('//catalogueItem/tradeItem/gtin')->text();
dump($gtin);
}
}
}
=> Memory used 1.2Go 30sec for execution
Only With Php \Dom
$dom = new \DOMDocument('1.0');
$dom->validateOnParse = true;
@$dom->loadXML(file_get_contents(__DIR__.'/rfc1767_33000.xml'), \LIBXML_NONET);
$xpath = new \DOMXpath($dom);
foreach ($xpath->query('transaction') as $nodeTransaction) {
$actionCode = $xpath->query('documentCommand/documentCommandHeader/@type', $nodeTransaction)->item(0)->nodeValue;
$transactionId = $xpath->query('transactionIdentification/entityIdentification', $nodeTransaction)->item(0)->nodeValue;
/** @var \DOMNode $nodeItem */
foreach ($xpath->query('.//catalogue_item_notification:catalogueItemNotification', $nodeTransaction) as $nodeItem) {
$documentId = $xpath->query('//catalogueItemNotificationIdentification/entityIdentification', $nodeItem)->item(0)->nodeValue;
$recipientGln = $xpath->query('//catalogue_item_notification:catalogueItemNotification/catalogueItem/dataRecipient', $nodeItem)->item(0)->nodeValue;
$highestLevelGtin = $xpath->query('//catalogue_item_notification:catalogueItemNotification/catalogueItem/tradeItem/gtin', $nodeItem)->item(0)->nodeValue;
$targetMarket = $xpath->query('//catalogue_item_notification:catalogueItemNotification/catalogueItem/tradeItem/targetMarket/targetMarketCountryCode', $nodeItem)->item(0)->nodeValue;
$documentStatusCode = $xpath->query('//catalogue_item_notification:catalogueItemNotification/documentStatusCode', $nodeItem)->item(0)->nodeValue;
/** @var \DOMNode $nodeCatalogueItem */
foreach ($xpath->query('.//catalogueItem', $nodeItem) as $nodeCatalogueItem) {
$gtin = $xpath->query('tradeItem/gtin', $nodeCatalogueItem)->item(0)->nodeValue;
dump($gtin);
}
}
}
=> Memory used 20Mo, 500ms for execution
Possible Solution
I don't know where is the memory leak. It seems to be Crawler initializations.
Additional context
Did I miss something when using Crawler ?
Thanks in advance