Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upGoogle book link only gets partial data #10
Comments
|
This works for me. It looks like this may be either a networking issue on your end or Google just blocking your IP. |
|
It no longer causes an internal server error with the update; might have to do with the xpath stuff being fixed. But now it does this: [{"key":"3SBPBISF","version":0,"itemType":"book","creators":[],"tags":[],"title":"Some American Ladies","url":"https://books.google.com/books/about/Some_American_Ladies.html?id=Ct6FKwHhBSQC","libraryCatalog":"books.google.de","accessDate":"CURRENT_TIMESTAMP"}] (4)(+0000000): Translate: Parsing code for Google Books (3e684d82-73a3-9a34-095f-19b112d88bbf, 2017-12-03 04:20:33) (3)(+0000001): Translate: Beginning translation with Google Books (3)(+0000001): Translate: resolving URL //books.google.com/books/feeds/volumes/Ct6FKwHhBSQC (3)(+0000000): Translate: resolved to http://books.google.com/books/feeds/volumes/Ct6FKwHhBSQC (3)(+0000001): Zotero.HTTP.doGet is deprecated. Use Zotero.HTTP.request (3)(+0000000): HTTP GET http://books.google.com/books/feeds/volumes/Ct6FKwHhBSQC (3)(+0000706): TypeError: Cannot read property 'textContent' of undefined
(2)(+0000000): Translate: Translation using Google Books failed: TypeError: Cannot read property 'textContent' of undefined (5)(+0000000): Translate: Running handler 0 for error (1)(+0000000): Translation using Google Books failed (1)(+0000000): TypeError: Cannot read property 'textContent' of undefined
|
|
Can you load http://books.google.com/books/feeds/volumes/Ct6FKwHhBSQC from your IP address? |
|
Also, have you changed the User-Agent setting? Google will almost certainly block you with a non-browser User-Agent. |
|
Yes, and no, just left the default string in there. The fact that it is getting the title (Some American Ladies) and embedded metadata runs okay does seem indicate it's scraping it (i.e. not an IP block), just that it doesn't have the XML for some reason. (I didn't paste the success part of the output, here's the rest:) 4)(+0000000): Translate: Parsing code for Embedded Metadata (951c027d-74ac-47d4-a107-9c3069ab7b48, 2018-02-13 19:20:46) (3)(+0000002): Translate: Beginning translation with Embedded Metadata (3)(+0000001): Translate: Embedded Metadata: found 7 meta tags. (3)(+0000000): Translate: Creating translate instance of type import in sandbox (4)(+0000000): Translate: Binding sandbox to http://books.google.de/books?hl=en&lr=&id=Ct6FKwHhBSQC&oi=fnd&pg=PP9 (4)(+0000001): Translate: Parsing code for RDF (5e3ad958-ac79-463d-812b-a86a9235c28f, 2018-05-08 19:39:38) (3)(+0000001): Translate: Initializing RDF data store (3)(+0000006): Translate: Promise not available in sandbox in _itemDone() (3)(+0000000): Translate: Saving item (5)(+0000000): Translate: Running handler 0 for itemDone (3)(+0000009): Translate: Looking for authors in byline, vcard (3)(+0000004): Translate: Found 0 elements with 'byline' class (3)(+0000001): Translate: Found 0 elements with 'vcard' class (3)(+0000001): Translate: No byline found. (3)(+0000001): Translate: Promise not available in sandbox in _itemDone() (3)(+0000000): Translate: Saving item (3)(+0000000): Translate: Translation successful (5)(+0000001): Translate: Running handler 0 for done (3)(+0000000): itemToAPIJSON: Discarded field publicationTitle: field not valid for type book |
|
I think you'll need to add some Zotero.debug() lines to see what it's getting instead of the XML. Google tends to lock down its data exports (BibTeX, etc.) more than its webpages, so it's totally possible that the XML is blocked for some reason. |
|
Oh, wait, I seem to be getting the same error now. We'll look into it. |
|
And now it's working for me again. Is this failing for you consistently? If you delete package-lock.json and run If so, can you add |
|
I'm getting it consistently, here's the output of the debug line:
|
Update: Maybe not, I had to look into the source of your comment, some xml tags got swallowed up by the markdown interpreter. |
|
Ok, so this is probably still just an XPath matching issue. Make sure you run Also note, that I am developing and testing on node.js v10.5.0, npm v6.1.0 |
|
I'm running the same version and the package-lock contains the same line :/ |
|
What version of node.js and npm are you on? |
|
The example "http://books.google.de/books?hl=en&lr=&id=Ct6FKwHhBSQC&oi=fnd&pg=PP9 works for me as well under node 8.11.1 and npm 6.0.0. @mvolz Can you activate this debug in the translator file https://github.com/zotero/translators/blob/master/Google%20Books.js#L93 and possibly some more comments later to see that the DOMParser is working as expected? |
This is certainly an xpath issue, which occurred in the xpath library before the latest fix. The line on which it fails is Google Books.js:144. If @mvolz is running a version of npm that does not support package-lock.json, this is exactly where it would fail if it failed to fetch the latest version of the package. You could also try to |
|
Removing the old packages seemed to do the trick. |

Formed in 2009, the Archive Team (not to be confused with the archive.org Archive-It Team) is a rogue archivist collective dedicated to saving copies of rapidly dying or deleted websites for the sake of history and digital heritage. The group is 100% composed of volunteers and interested parties, and has expanded into a large amount of related projects for saving online and digital history.

(3)(+0000000): Translators initialized with 523 loaded
(3)(+0000006): Listening on 0.0.0.0:1969
(3)(+0052583): HTTP GET http://books.google.de/books?hl=en&lr=&id=Ct6FKwHhBSQC&oi=fnd&pg=PP9
(1)(+0000203): Error: read ECONNRESET
InternalServerError: An error occurred retrieving the document
http://books.google.de/books?hl=en&lr=&id=Ct6FKwHhBSQC&oi=fnd&pg=PP9&dq=%2522Peggy+Eaton%2522&ots=KN-Z0-HAcv&sig=snBNf7bilHi9GFH4-6-3s1ySI9Q&redir_esc=y#v=onepage&q=%2522Peggy%2520Eaton%2522&f=false