Help:API: Difference between revisions
Jump to navigation
Jump to search
RobowaifuDev (talk | contribs) m (Page needs expansion) |
RobowaifuDev (talk | contribs) (Added search API example) |
||
Line 5: | Line 5: | ||
<syntaxhighlight lang="bash"> | <syntaxhighlight lang="bash"> | ||
python -m pip install requests wikitextparser | python -m pip install requests wikitextparser bs4 | ||
</syntaxhighlight> | </syntaxhighlight> | ||
Line 25: | Line 25: | ||
For more information, see [https://www.mediawiki.org/wiki/API:Parsing_wikitext Mediawiki API:Parsing wikitext]. | For more information, see [https://www.mediawiki.org/wiki/API:Parsing_wikitext Mediawiki API:Parsing wikitext]. | ||
== Search wiki == | |||
<syntaxhighlight lang="python"> | |||
import requests | |||
import urllib.parse | |||
from bs4 import BeautifulSoup | |||
search = "\"information theory\"" | |||
search = urllib.parse.quote(search) | |||
response = requests.get(f"https://robowaifu.tech/w/api.php?action=query&list=search&srwhat=text&srsearch={search}&utf8=&format=json") | |||
for result in response.json()["query"]["search"]: | |||
print(f"= {result['title']} =") | |||
print(BeautifulSoup(result["snippet"], "lxml").text) | |||
print("---") | |||
</syntaxhighlight> | |||
Result: | |||
<pre>= Entropy = | |||
...s. [[Claude Shannon]] was the first to introduce the concept of entropy in information theory, and his work laid the foundation for modern digital communication and cryp | |||
</pre> | |||
== Get list of pages == | == Get list of pages == | ||
Line 32: | Line 53: | ||
def get_pages(apmin=100): | def get_pages(apmin=100): | ||
'''apmin - minimum amount of characters pages must have to be included''' | |||
pages = [] | pages = [] | ||
response = requests.get(f"https://robowaifu.tech/w/api.php?action=query&format=json&list=allpages&apmin={apmin}") | response = requests.get(f"https://robowaifu.tech/w/api.php?action=query&format=json&list=allpages&apmin={apmin}") |
Latest revision as of 22:10, 3 May 2023
Robowaifu.tech has an API your AI waifu can access. The site is on cheap hosting, so please cache results so it doesn't get hammered with too many requests. Once the wiki sufficiently grows I will provide a dataset download of the whole site.
Requirements
python -m pip install requests wikitextparser bs4
Get page contents
import requests
import wikitextparser as wtp
page = "Machine learning"
page = page.replace(' ', '_')
response = requests.get(f"https://robowaifu.tech/w/api.php?action=parse&page={page}&format=json&prop=wikitext&formatversion=2")
obj = response.json()["parse"]
plain_text = wtp.parse(obj["wikitext"]).plain_text()
print(plain_text)
Result:
Machine learning is a field of study on methods that allow computers to learn from data without explicit programming. Instead of using human coded variables to perform specific tasks...
For more information, see Mediawiki API:Parsing wikitext.
Search wiki
import requests
import urllib.parse
from bs4 import BeautifulSoup
search = "\"information theory\""
search = urllib.parse.quote(search)
response = requests.get(f"https://robowaifu.tech/w/api.php?action=query&list=search&srwhat=text&srsearch={search}&utf8=&format=json")
for result in response.json()["query"]["search"]:
print(f"= {result['title']} =")
print(BeautifulSoup(result["snippet"], "lxml").text)
print("---")
Result:
= Entropy = ...s. [[Claude Shannon]] was the first to introduce the concept of entropy in information theory, and his work laid the foundation for modern digital communication and cryp
Get list of pages
import requests
def get_pages(apmin=100):
'''apmin - minimum amount of characters pages must have to be included'''
pages = []
response = requests.get(f"https://robowaifu.tech/w/api.php?action=query&format=json&list=allpages&apmin={apmin}")
obj = response.json()
pages += obj["query"]["allpages"]
while "continue" in obj:
apcontinue = obj["continue"]["apcontinue"]
response = requests.get(f"https://robowaifu.tech/w/api.php?action=query&format=json&list=allpages&apcontinue={apcontinue}&apmin={apmin}")
obj = response.json()
pages += obj["query"]["allpages"]
return pages
pages = get_pages(apmin=100)
for page in pages:
print(f"{page['title']}")
Result:
3D printing Animatronics Anime Arduino Art and design Artificial intelligence ...
For more information, see MediaWiki API:Allpages.