
On this quick piece, I exploit public Wikipedia information, Python programming, and community evaluation to extract and draw up a community of Oscar-winning actors and actresses.
All photographs have been created by the creator.
Wikipedia, as the most important free, crowdsourced on-line encyclopedia, serves as a tremendously wealthy information supply on numerous public domains. Many of those domains, from movie to politics, contain numerous layers of networks beneath, expressing different types of social phenomena reminiscent of collaboration. Because of the approaching Academy Awards Ceremony, right here I present the instance of Oscar-winning actors and actresses on how we will use easy Pythonic strategies to show Wiki websites into networks.
First, let’s check out how, as an example, the Wiki checklist of all Oscar-winning actors is structured:
This subpage properly reveals all of the individuals who have ever acquired an Oscar and have been granted a Wiki profile (probably, no actors and actresses have been missed by the followers). On this article, I give attention to appearing, which will be discovered within the following 4 subpages — together with principal and supporting actors and actresses:
urls = { ‘actor’ :’https://en.wikipedia.org/wiki/Class:Best_Actor_Academy_Award_winners’,’actress’ : ‘https://en.wikipedia.org/wiki/Class:Best_Actress_Academy_Award_winners’,’supporting_actor’ : ‘https://en.wikipedia.org/wiki/Class:Best_Supporting_Actor_Academy_Award_winners’,’supporting_actress’ : ‘https://en.wikipedia.org/wiki/Class:Best_Supporting_Actress_Academy_Award_winners’}
Now let’s write a easy block of code that checks every of those 4 listings, and utilizing the packages urllib and beautifulsoup, extracts the identify of all artists:
from urllib.request import urlopenimport bs4 as bsimport re
# Iterate throughout the 4 categoriespeople_data = []
for class, url in urls.objects():
# Question the identify itemizing web page and…