Closed
Description
When trying to parse info from some website i discovered that DomCrawler is terribly slow and unoptimized. With a few little changes i improved performance more than 150+ times (11500 ms -> 65 ms).
The main culprit is
https://github.com/symfony/symfony/blob/master/src/Symfony/Component/DomCrawler/Crawler.php#L862
because creation of DOMXPath is damn expensive.
I quick fixed it by (i know its very very dirty)
private function createDOMXPath(\DOMDocument $document, array $prefixes = [])
{
static $domxpath;
if (empty($domxpath)) {
$domxpath = new \DOMXPath($document);
foreach ($prefixes as $prefix) {
$namespace = $this->discoverNamespace($domxpath, $prefix);
if (null !== $namespace) {
$domxpath->registerNamespace($prefix, $namespace);
}
}
}
return $domxpath;
}
Another performance improvements (cca 200 ms in my environment) can be gained if results from CssSelector::toXPath($selector) at
https://github.com/symfony/symfony/blob/master/src/Symfony/Component/DomCrawler/Crawler.php#L675
are cached.