Getting Data From the Wikipedia API Using jQuery

Where would we be without Wikipedia? It’s hard to believe how we survived without this icon of the Internet. A novel idea of being an encyclopedia that anyone can edit, it’s always kinda interesting how you can look up an article about one thing and then end up somewhere completely different as you jump from article to article by clicking through its link-heavy content. But more important than that, Wikipedia is the authoritative mediating source of arbitration in all Internet arguments where the combatants usually don’t have anything other than extremely biased and inflammatory blogs as source material at hand and are too lazy to do their own research using actual respectable academic sources.

I’ve even heard that this spills over into the “real world.” Wikipedia has reported that it sees a huge spike in traffic coming in on mobile devices on Friday and Saturday nights and the underlying cause of this is believed to be attempts to settle disagreements than break out from drunken arguments in bars.

Wikipedia

So, all things considered Wikipedia usually has a good base on information just about any subject under the sun. And it got me thinking that a lot of websites and applications could benefit greatly from having an easy way to pull data from Wikipedia and use it to populate content. Fortunately, Wikipedia is built on some software that does provide an API for doing this. I remember seeing the super-slick Google Chrome experiment 100,000 stars put out by the folks at Google used Wikipedia excerpts as a brief description of each star. So I thought I’d look into the process of pulling data from Wikipedia using JavaScript and write it down because I figure someone else would find it useful at some point. Because other wikis are built upon the same foundation as Wikipedia, itself, this is not limited to getting data from Wikipedia alone. You can query other Wikis as well.

So to start, we’re going to need an article on Wikipedia to hit. Let’s use Jimi Hendrix. First we’re going to need to figure out what URL to call. I actually decided to use jQuery just because it’s a bit easier to parse through the returned data, but all of this could be done in native JavaScript if you wanted. So we’ll be making use of jQuery’s ajax method to make our asynchronous call.

So according to the API, the way that we call our Wiki API, is by using an endpoint like the following…

https://en.wikipedia.org/w/api.php?format=json&action=query&titles=Main%20Page&prop=revisions&rvprop=content

This is just one example. The API reference lists all of the possible actions. Reading through the docs, we can see that we’re probably going to want to specify JSON as the returned format. I chose the parse action and prop=text because I want to get the text out of the page. If we do this, rather than pulling down the entire page of text, let’s just say we want to get the top section (or blurb of Wikipedia data). To do this, we specify section=0. If you omit the “section” parameter, the entire page of data will be pulled down. It’s probably outside the scope of this article to go into a detailed explanation of what each of these actions do specifically, so if all of this seems a little overwhelming and it feels like we’re moving too fast, take a look through the API documentation to get a more detailed description of what each of these components do.

So we know at the very least we’re going to need something like what is below…

$(document).ready(function(){

    $.ajax({
        type: "GET",
        url: "https://en.wikipedia.org/w/api.php?action=parse&format=json&prop=text&section=0&page=Jimi_Hendrix",
        contentType: "application/json; charset=utf-8",
        async: false,
        dataType: "json",
        success: function (data, textStatus, jqXHR) {
            console.log(data);
        },
        error: function (errorMessage) {
        }
    });
});

If we try to make this call, nothing happens. This is because we are being blocked by the Same-origin policy. Just simple JSON is not going to suffice, so we’re going to need to trigger JSONP (JSON with Padding) by adding in a callback parameter callback=?.

Now, our AJAX call looks like the following…

$(document).ready(function(){

    $.ajax({
        type: "GET",
        url: "https://en.wikipedia.org/w/api.php?action=parse&format=json&prop=text&section=0&page=Jimi_Hendrix&callback=?",
        contentType: "application/json; charset=utf-8",
        async: false,
        dataType: "json",
        success: function (data, textStatus, jqXHR) {
            console.log(data);
        },
        error: function (errorMessage) {
        }
    });
});

If we look in our console now, we can see we have data from Wikipedia! Wow! If we take a look at this object we can see that the part we are interested in is shown below.

{
    warnings: { ... },
    parse: {
        text:{
            *:{
                 "<div class="dablink">This article..."
             }
        }
    }
}

Those maybe are not the key/value pairs I would have chosen but oh well. This means that to get the markup that we need out of the object, we just have find the key. And if we add a div to our page with an id, e.g. <div id=”article”></div>, we can dump the markup into this div like so…

$(document).ready(function(){

    $.ajax({
        type: "GET",
        url: "https://en.wikipedia.org/w/api.php?action=parse&format=json&prop=text&section=0&page=Jimi_Hendrix&callback=?",
        contentType: "application/json; charset=utf-8",
        async: false,
        dataType: "json",
        success: function (data, textStatus, jqXHR) {

            var markup = data.parse.text["*"];
            var blurb = $('<div></div>').html(markup);
            $('#article').html($(blurb).find('p'));

        },
        error: function (errorMessage) {
        }
    });
});

Note that there are a few other issues with our returned data (like warnings and links not working) so I’ve added a few extra items below to clean things up a bit…

$(document).ready(function(){

    $.ajax({
        type: "GET",
        url: "https://en.wikipedia.org/w/api.php?action=parse&format=json&prop=text&section=0&page=Jimi_Hendrix&callback=?",
        contentType: "application/json; charset=utf-8",
        async: false,
        dataType: "json",
        success: function (data, textStatus, jqXHR) {

            var markup = data.parse.text["*"];
            var blurb = $('<div></div>').html(markup);

            // remove links as they will not work
            blurb.find('a').each(function() { $(this).replaceWith($(this).html()); });

            // remove any references
            blurb.find('sup').remove();

            // remove cite error
            blurb.find('.mw-ext-cite-error').remove();
            $('#article').html($(blurb).find('p'));

        },
        error: function (errorMessage) {
        }
    });
});

You can see the demo below. I have also wrapped this into a jQuery plugin called Wikiblurb.js that you can find on GitHub. The plugin has more options than our simple example here to make the portions of your wiki that you’re grabbing much more customizable.

View Demo
, , , 9bit Studios E-Books

Like this post? How about a share?

Stay Updated with the 9bit Studios Newsletter

40 Responses to Getting Data From the Wikipedia API Using jQuery

  1. Axel says:

    Great introduction to Wikipedia API.
    I spotted a little typo, In the 1st and 2sd listing you have an unwanted “}” after the console.log call 😉

  2. Robin says:

    Thanks for that good tutorial! Really helps!

  3. NeaM says:

    Please How can I include the picture ?
    cause it displaying only text for me ?
    thanks

    • Ian says:

      Hey there NeaM — You might have to run it in a server environment or a local server like WAMP or MAMP. I pulled down the GitHub package and it didn’t seem to want to load the image from there. However, putting the package in a local server environment pulls everything (including the image) down fine.

  4. George says:

    Thanks for wasting my freaking time. Doesn’t work!

    • Ian says:

      Hey there George — Sorry it is not working for you. I just pulled down the current GitHub demo .zip and ran it from the desktop and it’s working fine… though as mentioned above, for full functionality you may need to run it in a server environment or a local server like WAMP or MAMP. What does your console say? Any errors?

      • thomas says:

        hey Ian thanks for the detailed tutorial. i learned a lot from that. and i want to say its fascinating how calm you stay if i would write a tutorial and see a comment like george`s i would just tell him to fuck off. regards and thxs again thomas

      • Ian says:

        Thanks Thomas! I’m glad that it was helpful for you. Maybe I’m completely numb to it because I’ve seen far worse said over Internet. Not to me necessarily, but others to each other. Let me know if you need help with anything else.

  5. David says:

    Hi Ian,

    I’m trying to add something like this to pages on a wordpress site.

    Could you explain how best to install the code please?

    Thanks

    • Ian says:

      Hi there David — It’s really just a matter of including your script in the page after jQuery and calling it. For WordPress the recommended way of installing scripts is to use the wp_enqueue_script function. You could also do something like this…

      <script src="<?php get_template_directory_uri() ?>/js/jquery.wikiblurb.js" type="text/javascript"></script>

      in your header.php or footer.php theme files.

  6. Tim says:

    Hi, this is really helpful. I was wondering what if I just wanted all the links created to lead back to a single webpage (ie the main wikipedia page). Would you be able to explain this? Thanks.

  7. Pablo says:

    Thanks a lot, Ian! your explanation helped me get the API to work with my little project!

  8. Ty says:

    Thanks Ian. Was able to use your code to get a call to the wikipedia API done. Thanks heaps and keep up the great work 🙂

  9. prateek says:

    i want get data from wiki and pass the information to my database. Any Help?

    • Ian says:

      If you need to put information into your database, I think you’re probably better off making the same API call server-side. This is more suited for going in the response direction.

  10. Sage says:

    Hi Ian, I’m really glad to have found this it’s exactly the kind of thing I was looking for. I do have a question, how would you go about changing the page to be a user-input based variable? I’d like to make use of a search form that pulls up the specified page on submit. Thanks again, and I hope you can get back to me!

    • Ian says:

      Hi there Sage — To do something like this you’d probably want to pass that info forward via query string from the form submit, get that page value out of the query string and pass it to the plugin on this next page.

  11. Hugo Barbosa says:

    Great example… I spent hours searching for a simple way to get the basic info from a search… and here your site and info had what I wanted… Now… How could I fetch the main image from the article I looked for, and add it on top on my html dump. Any pointers or ideas?

    • Ian says:

      Hi there Hugo — I actually made this blog post into a plugin called Wikiblurb which gives you the option of getting the “infobox” (which includes the image) *or* you can specify your own custom selector(s) to grab. Give that a go and let me know if you have any troubles with it. Thanks!

  12. izumi says:

    hello, i do copy ur code to try it out.. and i want the wiki article paste on my page.. how i want to do that… example.. document.getelementbyid(“output”).innerhtml= ;.. after innerhtml what should i type

  13. izumi says:

    to admin im sorry about my question just now, i just learn javascript so it confusing but now i already understand.. tq lot… but/… i want to ask u.. for the search.. i want to replace it with my defined var but it seems that if i insert one word is okay… but for two word… do i have to put underscore when inserting more words… or is there any ways that i can do about that

  14. izumi says:

    i mean.. i replace jimi_hendrik with input(my defined var)

    • Ian says:

      Hi there izumi —

      Yes you do need the underscores but fortunately I made this blog post into a plugin which can be found here. In the initialization it looks for spaces and will insert the underscores if needed, so all you need to do is call it with the name of whatever page it is you’re trying to find.

  15. wani says:

    hello, i want to replace the jimi hendrix with input that i get from user, but i notice that if user insert two word it will produce nothing.. i know the problem is because it need hv to include ‘_’ but how to just insert two word input normally and still get the output.. thankyouuuu

    • Ian says:

      I actually made this blog post into a plugin called Wikiblurb. In the initialization there it looks for spaces and will insert the _ characters if needed, so all you need to do is call it with the name of whatever page it is you’re trying to find.

  16. Thanks for the plugin.

    It works great.

    We are creating a module, where we will upload a csv file (list of pages for query ), and it will fetch all data and insert into db, on the basis of your module.

    Will share here once done.

    Thanks again for this plugin.

  17. Weeb says:

    What can i do if i want to get more articles that are related to the keyword i search for?

  18. sam says:

    For the customization of your plugin, how can you make it so that it grabs certain pieces of data from the wikipedia page. For example, if I just want name and birth date with a summary, how could I do that?

  19. Chuck says:

    great work! I appreciate it. Is there anyway to get this to work with the links working inside of the blurb to where it opens the next page in “blurb” format as well? Thanks!

    • Ian says:

      As things are implemented right now in the callback after the data loads you would have to replace all the links in the container that the Wikipedia data is loaded into with a link to a second page or page that ran the plugin on load off of some parameters or something.

  20. Rosa says:

    How can i display all sections?

  21. anfelo says:

    Hi Ian, thanks alot! This help me to get started with javaScript ajax requests and MediaWiki. I also wanted to get a json object containing a list of articles that matched my entry, I found this helpful link:

    https://stackoverflow.com/questions/25891076/wikipedia-api-fulltext-search-to-return-articles-with-title-snippet-and-image/25911756#25911756?newreg=8e78222a30d6428c901acb10baf70667

  22. py says:

    Merci Ian !
    Lost in the ‘Random Quote Machine’ challenge of freecodecamp.org, your library saved me another day of errand.
    With some modifications I have been able to produce an array filled with the gorgeous poetry of Guillevic (you may read those marvelous lines on this page (in french, sorry)) :
    https://fr.wikiquote.org/wiki/Carnac
    Thanks, thanks a lot !
    py

  23. Theo says:

    Hi
    Thank you for the post.
    Links are working.
    This is the idea:

    //working links
    blurb.find('a').each(function() {
    var wikiAttr = $(this).attr('href');
    var hash = "#";
    var www ="www";
    if(wikiAttr.indexOf(hash) === -1 && wikiAttr.indexOf(www) === -1){
    $(this).attr('href', 'https://de.wikipedia.org' + wikiAttr).attr('title', 'https://de.wikipedia.org' + wikiAttr).attr('target', '_blank').addClass('myWikiLink');
    }
    if($(this).hasClass('external')){
    $(this).attr('title', wikiAttr).attr('target', '_blank');
    }
    });

    regards
    theo

    p.s.:you may have to alter css and inspect the wiki dom

  24. Eldon says:

    Great tut, thanks for this. I have a request regarding the format this is calling. text&section=30 make sit easy to target specific sections within the wiki page, however, when we get to sections where information is presented in a table, the is blank. My question, is how we would revise this code so that a table section could be displayed? Thank you.

Leave a Reply

Your email address will not be published. Required fields are marked *