Jump to content

Wikipedia:Request a query

From Wikipedia, the free encyclopedia
(Redirected from Wikipedia:SQL requests)

This is a page for requesting one-off database queries for certain criteria. Users who are interested and able to perform SQL queries on the projects can provide results from the Quarry website.

You may also be interested in the following:

  • If you are interested in writing SQL queries or helping out here, visit our tips page.
  • If you need to obtain a list of article titles that meet certain criteria, consider using PetScan (user manual) or the default search. Petscan can generate list of articles in subcategories, articles which transclude some template, etc.
  • If you need to make changes to a number of articles based on a particular query, you can post to the bot requests page, depending on how many changes are needed.
  • For long-term review and checking, database reports are available.

Quarry does not have access to page content, so queries which require checking wikitext cannot be answered with Quarry. However, someone may be able to assist by using Quarry in another way (e.g. checking the table of category links rather than the "Category:" text) or suggest an alternative tool.

List of all talk pages matching "Archives\s*\/\s*\d{1,3}"

[edit]

Usually archive pages on Wikipedia are of the format "/Archive 1", "/Archive 2",... Often when talk pages are moved, the mover does not update the Archiving instructions for the bots. This causes the bot to send sections to archives titled "Archives/ 1", "Archives/ 2", breaking the archiving pages pattern as well as sequence. For example, the last archival before the move might be to "Archive 4". After move, newer sections go to "Archives/ 1". In order to fix them, I would need this query. Thanks! CX Zoom[he/him] (let's talk • {CX}) 20:06, 2 November 2024 (UTC)[reply]

quarry:query/87612. The {1,3} is superfluous without anything following it; I didn't assume an implicit $ since an implicit ^ to go with it would prevent any matches. If you were trying to filter out titles like Talk:.30 carbine/Archives/2014/June, you'd need something like ($|\D) afterwards. —Cryptic 21:18, 2 November 2024 (UTC)[reply]
Thank you very much! CX Zoom[he/him] (let's talk • {CX}) 10:12, 3 November 2024 (UTC)[reply]

List of articles likely to have one or no sources

[edit]

While making this edit recently, it occurred to me that we ought to have a way of at least semi-automatically identifying and tagging articles with either a single or no sources. I'd like to be able to do an AWB run of likely such articles.

Given that there are many different ways to do sources, I'd like to start with a conservative query, which lists all articles that contain none of the following strings:

  • <ref
  • http://
  • Notes
  • cite
  • Reference
  • Sources
  • Citation
  • Bibliography
  • sfn

I don't know how to construct a RegEx query with a negative (the internet seems to have some ideas, but I struggle to convert this into Wikipedia's flavor), so I'd appreciate some help. Could anyone help me generate this list? Cheers, Sdkbtalk 05:14, 14 November 2024 (UTC)[reply]

No access to article text. —Cryptic 06:19, 14 November 2024 (UTC)[reply]
this regex search is a start. It gives 10000 results then times out. * Pppery * it has begun... 06:24, 14 November 2024 (UTC)[reply]
You'll want to at least make that case-insensitive, anchor "ref" and maybe "cite" to word boundaries, and match "https://" too. But still, WP:Request a query isn't WP:Request a search. —Cryptic 06:34, 14 November 2024 (UTC)[reply]
...holy crap, it is. It shouldn't be. —Cryptic 06:35, 14 November 2024 (UTC)[reply]
The underlying ElasticSearch cluster has a read-only replica on Toolforge, which can be queried. So I'd say this page is the right place for such requests. – SD0001 (talk) 07:41, 14 November 2024 (UTC)[reply]
If someone comes here looking for help with Elasticsearch's middle-end, they're going to be very, very disappointed. —Cryptic 08:13, 14 November 2024 (UTC)[reply]
Thanks, @Pppery! After expanding the query to -insource:/([Rr]ef|http|[Nn]otes|[Cc]ite|[Ss]ources|[Cc]itation|[Bb]ibliography|sfn|list of|lists of|link|further reading|Wiktionary redirect)/ -intitle:list -deepcategory:"Set index articles" it's starting to turn up mostly useful results. Cheers, Sdkbtalk 07:17, 14 November 2024 (UTC)[reply]
You can get more results before it times out by adding more non-regex filters. For instance, adding -hastemplate:"Module:Citation/CS1" gives 15k results instead of just 2k. – SD0001 (talk) 07:39, 14 November 2024 (UTC)[reply]
Anyway, the sort of things this page can do to answer your original question are to give you lists of pages with zero, or zero or one, external links, or that don't transclude any of a set of templates, or both; and as a bonus filter out redirects (which I'm fairly sure search does whether you like it or not), disambigs, and - to some extent - list pages. —Cryptic 07:16, 14 November 2024 (UTC)[reply]
Maybe rename this page to WP:Request a SQL query. * Pppery * it has begun... 20:13, 14 November 2024 (UTC)[reply]
Or we could ask people to read past the page title to the first two sentences. —Cryptic 02:55, 15 November 2024 (UTC)[reply]

Syntax error due to using a reserved word as a table or column name in MySQL

[edit]

https://quarry.wmcloud.org/query/87911

https://stackoverflow.com/questions/23446377/syntax-error-due-to-using-a-reserved-word-as-a-table-or-column-name-in-mysql

It isn't handling the `user` table right as "user" is an SQL reserved word, I think.

The syntax highlighter was showing "user" in red, so I surrounded it with backticks `user`, then it was showing in light blue.

I think it needs to be highlighted in white to work correctly. But how? wbm1058 (talk) 18:47, 14 November 2024 (UTC)[reply]

Unrelated to the reserved word. `WHERE IS NULL(u.user_name)` should be `WHERE u.user_name IS NULL`. But see prior noise at User talk:Primefac/Archive 32#U2 deletions if you want to continue this. * Pppery * it has begun... 20:12, 14 November 2024 (UTC)[reply]
https://www.w3schools.com/sql/sql_isnull.asp indicates that my syntax should be valid. Two alternative ways to do the same thing? Regarding the "prior noise", I'm a more competent administrator who's checking page histories, and leaving redirects within user space alone. My current focus is on cross-namespace redirects from user pages of nonexistent users to outside of userspace. My recent deletion log will give you an idea; I'm trying to make a more specific query to reduce the noise level in the query results I've been working from. – wbm1058 (talk) 20:53, 14 November 2024 (UTC)[reply]
Wikimedia uses MySQL (actually MariaDB which uses MYSQL-ish syntax), not SQL server where your link says ISNULL (not IS NULL which the query uses) is valid. * Pppery * it has begun... 21:06, 14 November 2024 (UTC)[reply]
MariaDB supports ISNULL(), and it works the way Wbm1058 was trying to use it (modulo the misplaced space). SQL Server's ISNULL() is a synonym of COALESCE() instead. x IS NULL is generally safer precisely because of that incompatibility. —Cryptic 21:29, 14 November 2024 (UTC)[reply]
I tried just changing the syntax of the "IS NULL" statement as suggested. It was cooking on that for a while, and then:
"Error
This web service cannot be reached. Please contact a maintainer of this project.
Maintainers can find troubleshooting instructions from our documentation on Wikitech."
Hopefully my query didn't just crash the server. – wbm1058 (talk) 21:55, 14 November 2024 (UTC)[reply]
It just ran to completion, so simply changing the "IS NULL" statement fixed the syntax error. Now on to figure out the results, and tweak the query to do what I really want it to do. Thanks for your help. wbm1058 (talk) 22:09, 14 November 2024 (UTC)[reply]

FYI, I'm now feeling the joy. User:Wbm1058/Userpages of nonexistent users is my report of 400 pages which I think may all be safely speedy-deleted under U2: Userpage or subpage of a nonexistent user. This report was culled from a report of 1960 pages, by INTERSECT with the user table SELECT. This is indicative of the poor page-move interface design, which leads editors who think they're publishing user drafts to keep pages in userspace when they really wanted to move to mainspace, because they neglected the namespace dropdown in the move-page user interface. – wbm1058 (talk) 14:11, 15 November 2024 (UTC)[reply]

Dusty articles within all subcategories of a parent category

[edit]

Is this possible? I'd like to get a list like Special:AncientPages but for anything within any subcategory of Category:Food and drink. It would make quite a nice little To Do list for times I feel like doing some research and writing but don't have a particular bee in my bonnet that very minute. Thanks for any help! Valereee (talk) 15:16, 17 November 2024 (UTC)[reply]

What is "dusty"? Neither Special:AncientPages nor Wikipedia talk:Special:AncientPages say what it does. Is it a sort by timestamp of last edit?
In direct subcategories only, include the handful of pages directly in the category, or the whole tree? If the last, to what depth? Examples: Category:Food and drinkCategory:DairyCategory:Dairy industryMark Ezell is depth 2, and Food and drinkCategory:DairyCategory:Dairy industryCategory:Dairy farmingGoatherd is depth 3; neither the page itself nor the root category count. —Cryptic 16:51, 17 November 2024 (UTC)[reply]
Yes, it's a list of articles by date of most recent edit.
Hm, on the second question. Ideally I'd end up with is a list of, say, food items that hadn't been edited in ten years. Or chefs, or restaurants, or food animals or whatever. Maybe I need to choose a more specific subcategory? Valereee (talk) 17:24, 17 November 2024 (UTC)[reply]
Well, ok,
The reason I need a maximum depth is because - like almost all reasonably broad categories - Category:Food and drink eventually includes a significant portion of all categories. Depth 10, for example, has 122639 different categories in the tree, out of 2.4 million total categories (including maintenance categories, category redirects, and so on), and you really quickly start getting unrelated pages like Category:Food and drinkCategory:Food and drink by countryCategory:Agriculture by countryCategory:Agriculture in Europe by countryCategory:Agriculture in RomaniaCategory:Forestry in RomaniaCategory:Romanian woodworkersConstantin Brâncuși.
Or, if you like, you can give me a list of categories to pull from. Even if it's a large list, or something like "Anything in any direct subcategory of Category:Food and drink, Category:Cuisine, Category:Chefs, Category:Poultry, [20 or 30 other cats]". —Cryptic 18:33, 17 November 2024 (UTC)[reply]
Oh, and do you want non-mainspace pages in the list or not? What about redirects? —Cryptic 18:38, 17 November 2024 (UTC)[reply]
lol...clearly in over my head here. :D Thank you for your patience.
So, no to feed a cold, starve a fever. Yes to recipe, dulce de leche and ice milk.
I think maybe start with something that's likely to contain fewer extraneous things. Category:Chefs in a way that will allow me to see, for instance, the articles that are in Category:Chefs by nationality > Category:Women chefs by nationality > Category:British women chefs > Category:Women chefs from Northern Ireland that haven't been edited in the last ten years. Valereee (talk) 18:50, 17 November 2024 (UTC)[reply]
Oh, no non-mainspace pages, no redirects. Valereee (talk) 19:13, 17 November 2024 (UTC)[reply]
None quite that old in either tree. quarry:query/87975 for Category:Food and drink depth 3 (oldest is Land reform in the Austrian Empire, 2015-11-16 18:36:35 - see what I meant about unrelated pages?), quarry:query/87976 for Category:Chefs depth 4 (oldest is Richard Ekkebus, 2019-12-16T04:47:03). —Cryptic 19:19, 17 November 2024 (UTC)[reply]
Well, thank you for your work, and sorry to waste your time! Valereee (talk) 19:35, 17 November 2024 (UTC)[reply]
Not wasted at all. Not your fault the category system is terrible for datamining.
There might be some value in finding the latest revision that wasn't marked minor, and maybe excluding ones made by bots too, but that's going to be harder and a lot slower. Would definitely need to cut the set of articles to look at to something on the order of a couple thousand before looking at the edits, rather than the tens of thousands in that first tree. —Cryptic 20:14, 17 November 2024 (UTC)[reply]
Thanks. And I've actually already found an article that needs attention from your 87976 query, so win!
The point for me here is looking for categories that have many articles that haven't been updated since before sourcing started modernizing. It's a bit tricky because the articles that were created first -- probably in any category -- are also likely the articles that get update often, have multiple watchers, etc. So it's possible there just aren't huge numbers of food articles that need this kind of attention. Valereee (talk) 21:18, 17 November 2024 (UTC)[reply]

Number of articles that are actually articles

[edit]

There are 6,912,195, but AIUI that includes disambig pages, stand-alone lists, and outlines, and maybe even portals (i.e., all content namespaces, not just the mainspace) but excludes redirects. Is there a way to get a count of the number of just plain old ordinary articles, excluding the other types? (A percentage from a sample set is good enough; I'd like to be able to write a sentence like "Of the 6.9 million articles, 6.2 million are regular articles, 0.45 million are lists, and 0.2 million are disambig pages.") WhatamIdoing (talk) 22:46, 17 November 2024 (UTC)[reply]

@WhatamIdoing: according to Category:All disambiguation pages, there are 362,957 of those. BD2412 T 23:29, 17 November 2024 (UTC)[reply]