The Wayback Machine - https://web.archive.org/web/20160410012212/https://phabricator.wikimedia.org/T95715
MaxSem created this task.Via Web · Apr 10 2015, 4:22 PM
MaxSem claimed this task.
MaxSem added a project: ArchCom-RfC.
MaxSem added a subscriber: MaxSem.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptVia Herald · Apr 10 2015, 4:22 PM
MaxSem added a comment.Via Web · Apr 10 2015, 4:54 PM
Comment Actions

Well, there's plenty of time to fix it ;)

Aklapper added a comment.Via Web · Apr 10 2015, 7:44 PM
Comment Actions

@MaxSem: Please use descriptive task summaries.

Aklapper changed the title from "RFC: ditch crappy API formats" to "RFC: Ditch wddx, dump, yaml, dbg, txt API formats".Via Web · Apr 10 2015, 7:44 PM
Aklapper set Security to None.
Anomie added a project: MediaWiki-API.Via Web · Apr 13 2015, 1:27 PM
Anomie moved this task to Non-Code on the MediaWiki-API workboard.
daniel moved this task to Approved on the ArchCom-RfC workboard.Via Web · Apr 15 2015, 8:55 PM
Ricordisamoa added a subscriber: Ricordisamoa.Via Web · Jul 1 2015, 7:33 PM
Legoktm closed this task as "Resolved".Edited · Via Web · Jul 2 2015, 4:44 AM
Legoktm added a subscriber: Legoktm.
Comment Actions
Legoktm reopened this task as "Open".Via Web · Jul 2 2015, 4:45 AM
Comment Actions

Uhh, forgot about yaml, dbg and txt.

Legoktm edited the task description. (Show Details)Via Web · Jul 2 2015, 4:46 AM
Krinkle added a subscriber: Krinkle.Via Web · Nov 11 2015, 9:59 PM
Comment Actions

We briefly discussed this in the archcom today. The second step of execution here was scheduled for 12 November 2015, which is tomorrow. There doesn't seem to be a commit yet for this, is there?

I would recommend we announce this to relevant mailing lists as a reminder, and let it ride the deployment train over the course of next week. And for third party wikis, it will be releases as part of MediaWiki 1.27.0.

Some numbers for reference:

https://logstash.wikimedia.org/#/dashboard/elasticsearch/mediawiki channel:api-feature-usage message:format

Last 7 days:

TermCount
format=txt267564
format=dbg138104
format=txtfm56095
format=yaml21914
format=dbgfm19
format=yamlfm13

For context, here is the overall 10 top:

https://logstash.wikimedia.org/#/dashboard/elasticsearch/mediawiki channel:api-feature-usage

Top 10 last 7 days:

TermCount
prop=langlinks&llurl9318987
action=query&prop=revisions+base&generatexml1807708
action=query&list=deletedrevs1142690
unclear-"now"-timestamp559538
action=tokens477398
action=expandtemplates&!prop333121
format=txt267581
action=search&srprop=score170086
format=dbg138072
action=parse&generatexml75473
Anomie added a subscriber: Anomie.Via Web · Nov 12 2015, 3:09 AM
Comment Actions

https://logstash.wikimedia.org/#/dashboard/elasticsearch/api-feature-usage might be slightly more useful, although in the end it's all the same data.

TL;DR summary is that hits are mostly IPs with little opportunity to contact whoever it is that's hitting these formats. There's maybe 4 where we have enough information that contact could even be possible.

Some analysis:

format=txt 267564

The top 100 is almost entirely IPs; I see one human user (who has tons of user scripts in their user .js) and one logged-in bot.

43% is coming from one IP with a generic agent, fetching extracts and pageinfo for seemingly-random articles.

10% is coming from various IPv6s that seem to belong to Facebook (they share a prefix and all include ":face:b00c:", and spot checking whois is consistent), generic agent exporting pages by pageid.

Another 9% is from one IP with a browser-like agent (probably fake), apparently fetching section 0 for US cities.

Another 8% is posts from an IP with an actually useful agent, best guess is it's a backend loading data for a phone app that matches the agent.

format=dbg 138104

99% is from one IP fetching page content with various agents, many from common web spiders. Almost certainly a live mirror of some sort.

format=txtfm 56095

96% from one IP with a generic agent, that seems to be fetching the top-revision timestamps for biographies on one wiki.

format=yaml 21914

82% requests with a "contact@" email address as the agent, at a domain that seems to be a brand monitoring/management company. Queries look like a strange way of getting HTML for various logos.

Another 9% has an agent attributing it to a particular bot. Actual queries seem to be just parsing the same page every 5 minutes.

format=dbgfm 19
format=yamlfm 13

So low it's not worth caring about.

Anomie added a comment.Via Web · Nov 12 2015, 6:04 PM
Comment Actions

We briefly discussed this in the archcom today. The second step of execution here was scheduled for 12 November 2015, which is tomorrow. There doesn't seem to be a commit yet for this, is there?

Not that I know of. I'll make one.

I would recommend we announce this to relevant mailing lists as a reminder, and let it ride the deployment train over the course of next week. And for third party wikis, it will be releases as part of MediaWiki 1.27.0.

I'd rather give slightly more notice on the reminder: let's let it ride the train for 1.27.0-wmf.8 rather than 1.27.0-wmf.7.

gerritbot added a subscriber: gerritbot.Via Conduit · Nov 12 2015, 6:40 PM
Comment Actions

Change 252742 had a related patch set uploaded (by Anomie):
Stop testing deprecated API formats

https://gerrit.wikimedia.org/r/252742

gerritbot added a project: Patch-For-Review.Via Conduit · Nov 12 2015, 6:40 PM
Comment Actions

Change 252743 had a related patch set uploaded (by Anomie):
API: Remove dbg, txt, and yaml formats

https://gerrit.wikimedia.org/r/252743

gerritbot added a comment.Via Conduit · Nov 13 2015, 8:09 AM
Comment Actions

Change 252742 merged by jenkins-bot:
Stop testing deprecated API formats

https://gerrit.wikimedia.org/r/252742

gerritbot added a comment.Via Conduit · Nov 18 2015, 4:58 PM
Comment Actions

Change 252743 merged by jenkins-bot:
API: Remove dbg, txt, and yaml formats

https://gerrit.wikimedia.org/r/252743

Legoktm edited the task description. (Show Details)Via Web · Nov 18 2015, 5:51 PM
Legoktm closed this task as "Resolved".
Krinkle moved this task to Implemented on the ArchCom-RfC workboard.Via Web · Feb 10 2016, 9:33 PM

Add Comment

Morty Proxy This is a proxified and sanitized view of the page, visit original site.