To install click the Add extension button. That's it.

The source code for the WIKI 2 extension is being checked by specialists of the Mozilla Foundation, Google, and Apple. You could also do it yourself at any point in time.

4,5
Kelly Slayton
Congratulations on this excellent venture… what a great idea!
Alexander Grigorievskiy
I use WIKI 2 every day and almost forgot how the original Wikipedia looks like.
What we do. Every page goes through several hundred of perfecting techniques; in live mode. Quite the same Wikipedia. Just better.
.
Leo
Newton
Brights
Milds

From Wikipedia, the free encyclopedia

In computing, linked data (often capitalized as Linked Data) is a method of publishing structured data so that it can be interlinked and become more useful through semantic queries.[1] It builds upon standard Web technologies such as HTTP, RDF and URIs, but rather than using them to serve web pages for human readers, it extends them to share information in a way that can be read automatically by computers.

Tim Berners-Lee, director of the World Wide Web Consortium (W3C), coined the term in a 2006 design note about the Semantic Web project.[2]

Linked data may also be open data, in which case it is usually described as linked open data (LOD).

YouTube Encyclopedic

  • 1/5
    Views:
    2 050
    223 678
    4 081
    1 375
    5 789
  • The Next Evolution of Data (Linked Data) & Understanding The Types of Big Data!
  • Data Structures: Crash Course Computer Science #14
  • Big Data: R and Linked Data Streams
  • Library Linked Data in the Cloud
  • Liberate Your Data, Cloud Computing for the User: Janine Terrano at TEDxTacoma

Transcription

[Music] Hi, thanks for tuning into Singularity Prosperity. This video is the second in a three-part series discussing big data. If you haven't seen the first video, discussing the importance and the exponential rate of growth of data, be sure to check it out. In this video, we'll be discussing how we can utilize the vast quantities of data being generated as well as the implications of linked data on big data. So, what is big data you ask, it is simply large sets of data. For a better understanding of big data, let's look at the three Vs, its core pillars: volume, variety and velocity. We looked at volume in the previous video, with how much data is being produced and the rate of growth of data generation. Variety in big data means having either structured or unstructured data. Structured data is a tool we've been optimizing since the birth of the web, data that is organized in tables within databases and mostly easily understandable. However, the vast majority of data on the web is unstructured, and constantly morphing, expanding and evolving. Most unstructured data to us at this point is essentially static noise, but this is beginning to rapidly change as we'll later explore. The real power of big data is being able to sift through various data sets both structured and unstructured, understand them and derive correlations and conclusions from them. You can consider big data to be the microscope we use to peer deeper into the world around us. For example, let's take a look at the astronomical data set we talked about in the previous video in this series. All that data on the motion of the planets by itself would have been useless, but the conclusion derived from it; that we and all the other planets orbit the Sun, validated the dataset. So first the data had to be accumulated, then a conclusion in the dataset drawn and finally the conclusion had to be verified by gathering more data and other astronomers verifying it as well. Verifying the conclusions and correlations drawn from big data as accurate representations of the world is the hard part. Depending on how you view data, you will draw different correlations from it, not all of them are correct or have real world applications. As discussed in the previous video, with sensors almost everything is quantifiable, from the liquidity in the soil to radiation in the atmosphere to the heartbeat and breathing of a newborn. Yet even now, much our world remains to be digitized: from mapping of the human body on a cellular level to better understanding the genome to in-depth mapping of the oceans - the list can go on. The increasing number of internet connected devices due to mainstream adoption, smart cities and more will help this digitization. Individually these products are great and help contribute to our own personal well-being and life, but the real power is when hundreds of millions of people contribute data and we can begin to see patterns emerge from across the world. However, at this point, data generation or volume isn't our issue anymore, it is understanding the variety in data and datasets and being able to utilize them by deriving verifiable conclusions and correlations that have practical real-world applications. When working with structured data, languages such as SQL have been around for nearly two decades to assist in recognizing patterns in data sets. As big data volume has grown, additional data frameworks such as a Hadoop have been introduced. For a simplistic example of a structured data base, let's take a look at something an ecommerce website, let's say, Amazon, would use: Let's imagine a structured database of the columns: gender of users, categories of products bought and rating given to the purchase product. By analyzing the data they could derive what categories each gender favors based on rating given to the products they've purchased in those categories. To validate their findings, these results would be plotted over years, and by doing so even more patterns would be observed, like the categories changing with the seasons and the same repetition year after year. If only life was as simple as our example, working with structured data when it comes to transactions is typically fairly simple, the real difficulty arrives when working with multiple sensors or other ambiguous datasets. Even more convoluted is unstructured data, most of the data on the web is unstructured and correlations cannot be easily seen or derived. Some examples of unstructured data include: photos, videos, social media data, satellite images, scientific data and more. These data types and sets are extremely difficult to convert into actionable insights. To derive any conclusion from unstructured data is a long, manual, painstakingly difficult process. Also there is a major shortage of skills in analyzing unstructured data. Not to mention, that working with structured data can be just as difficult, as humans we could only see patterns at such a deep level before things stop making sense to us. With the increasing popularity of machine learning this is starting to rapidly change, as we'll explore in our upcoming AI video series. As stated earlier, big data acts as more of a microscope, allowing us to peer deeper into the world and draw correlations and conclusions from the patterns we see through the use of powerful algorithims and machine learning. The role of data scientists is to be able to derive these conclusions, by taking data from various sources and deciphering them by shifting and transforming the data to understand it. In other words, to recontextualize the data and put it in formats we can perceive, think and talk about and then taking action on them. Machines can derive the actionable insights but it's up to us to actually take action on them. The role of data engineers is to get data into a structured or usable format, whether from sensors or converting the unstructured data constantly being added to the web. It is to be noted that new types of data are constantly being uploaded to the web, further increasing the complexity of unstructured data. Also, to be noted is the fact that data engineers and scientists were jobs that did not exist not too long ago, further exemplified how fast this field is growing. The percentage of data on the web that was in a useable analytical format in 2013 was a 22%, by 2020 this is expected to reach 37%. What this means is that most data on the web is simply unusable, but through the efforts of data engineers this is beginning to change. We can also help identify data through the use of tags, hashtags and metadata such as: video descriptions and transcriptions, allowing the data to be more understandable for us and computers for big data analytics. What many of the statistics mentioned in this video don't take into consideration is a huge paradigm shift data is currently undergoing, linked data. When we transitioned from spoken to written language and then to digital, each time a revolution was needed in the way we communicate the further advanced our species and technology. Linked data does this once again for us and our devices, the next step in the data revolution. If you watched my video on the evolution of the web, you'll know that linked data is critical for web 3.0, the Semantic Web, in actually giving context and meaning to all the information we see. Linked data is also critical for laying the groundwork for future technologies to come with web 4.0. So, what is link data and what does it fix? Linked data is simply a way of structuring data, so when you are ready you can easily share with the world and aims to solve the data isolation and hoarding issue, as well as reduce the data clutter on the web. Let me explain: If you publish a document say an excel file on the web, that is a fine way to get your information out, if the purpose is just for reading, say in a blog. However, the data behind such a document is locked away and inaccessible to the public. If a change is made to the data in the document, that previous post and all other posts that relate to the previous data set are essentially rendered useless, thereby increasing useless data clutter on the web. Linked data is about applying the core principles of the web, sharing information, on a much deeper level than just presenting the information. Linked data gives each entity on the web its own Uniform Resource identifier, URI, just like the URI aka URL given to websites. This will dramatically change the way information is shared and stored on the web. For example, say there is a post on a scientific journal discussing the correlation between atmospheric conditions and the growth of tomatoes, with data obtained through atmospheric sensors over the past two years. Without linked data, while the article would be useful in assisting farmers and gardeners to grow better quality tomatoes, the data would only be viewable not interactable. Also, the atmospheric data set would have to be continually updated through new articles as more data was obtained and new correlations derived, further increasing data clutter on the web. With linked data, the latest captured sensor data could be continuously updated to the article and bring deeper insights into the best conditions for tomato growth. Also, other farmers and gardeners could begin to post their own articles on the growth of other crops, with additional information such as soil moisture and more, all linking back to the original data set. So instead of the article just being a static web page showing the best atmospheric conditions to grow tomatoes based on past data, this simple article now evolves into a platform of its own, becoming a guide to growing various crops based on atmospheric, soil saturation and other conditions. As more people come to this hypothetical platform, it would continually evolve and because of the structure of linked data, big data scientists or machine learning algorithms could go through the data and begin to find correlations that weren't previously known. This video clip should provide a better representation of the power of linked data: An app developer has an idea, she uses the City Council open dataset to find every drop curve location in the city and creates a route finder specifically who will change. The council learned about that and felt inspired to Commission a website to visualize this data on upcoming roadways, helping commuters to plan their travel. This web developer is big into nature, so in his spare time he creates a similar website using an open data set from a natural conservation charity. His impressive site puts up a ton of shares on social media, earning the charity free boosts in exposure and donations which they use to fund a new data set. Which quickly gets linked to a university dataset being studied by a team of researchers, the link to your data reveals patterns in their data leading to a game-changing new discovery. With linked open data, a whole new world of opportunity is that our fingertips. An example of simplistic linked data we can see in our everyday lives already, is the increasing complexity of Google search. When searching for a movie for example, Google will pull information from IMDB, Rotten Tomatoes, Wikipedia, YouTube and other trusted sites as well as even display show times from your local theater. Along with reducing data clutter since data updated in one location can be updated across all locations that is linked to, linked data will also reduce data hoarding and isolation. Much of the data on the web is stored in what many like to call, data vaults or silos. While sometimes these are necessary for protecting sensitive information, often useful information is stored away as well due to the vast volumes of data some companies generate. Also due to internal office politics, regulatory and a variety of other reasons often data between departments in the same company will even be isolated from each other. One person's useless data is another person's key to success, as linked data becomes more prevalent, information stored away in these silos can begin to be unlocked and networked together so more insightful correlations between various sets of data can be drawn. Linked data will not only be critical for the transfer of human knowledge, but away from machines to efficiently communicate with each other as well. For example, say a sensor in your backyard shows the soil is dry and the sensors the weather network are using are predicting rainfall. Instead of wasting water in your garden, your home personal assistant will tell you not to water the plants because of a thunderstorm later, as the two sensors were able to seamlessly and efficiently communicate between each other. This machine readability linked data is oriented around is a key feature for big data analytics, providing a structured format for data added to the web and a way to convert current data on the web into a structured format. This added structure will allow for machine learning algorithms and big data scientists to better navigate through the data and use more of the data available on the web. The information on the web will begin to communicate amongst itself and even draw conclusions from itself like a living organism. Linked data sets are beginning to grow exponentially, from none in 2006, 30 in 2007, 300 in 2011 and so far 1150 this year. At this exponential rate of growth, we'll hit 5000 by 2020 and 25,000 by 2025! These numbers may appear small but each dataset could have anywhere from tens, hundreds, thousands or even millions of entries. For example, DBPedia has over 4.5 million. Not all data added to the web has to be uploaded in a linked format and it is nearly impossible that it ever will. However, it is critical that we get the most important pieces of data linked together, such as: medical research, traffic maps, environmental data and other very important datasets. As time progresses, as with any technology, exponential adoption will bring change faster than many expect. Linked data will open up a new era of innovation, paving the path for future technologies and better utilization of current technologies and data. Blockchain is aimed to radically transform linked data as well, but that is a topic best left for another video. The third V of big data is velocity. While in the first video in this series we looked at one aspect of velocity, the rate of growth of the volume of big data, velocity also means a rate of analysis of big data. In today's growing mobile society people demand real-time results. I'm sure many of you have experienced frustrations when a webpage takes longer than a few milliseconds to load. Linked data assists in increasing the velocity of data processing, since as discussed earlier, changes made to data in one location will be propagated across the entire network. Unfortunately it's not always as simple as a change to an existing analyzed data set, with the size and complexity of big datasets yet to be analyzed, even a supercomputer would take days, months or even years to derive any conclusions from data let alone deliver it real time to devices across the world. As we'll explore in the next video in this digital infrastructure series, cloud computing will play a pivotal role in big data analytics, allowing everyone to access big data insights from individuals, companies and startups in real time. Also in the next few videos in our big data series we'll explore some of the use cases of big data as well as the issues that big data poses to society. At this point the video has come to a conclusion, I'd like to thank you for taking the time to watch it. If you enjoyed it please leave a thumbs up and if you want me to elaborate on any of the topics discussed or have any topic suggestions, please leave them in the comments below. Consider subscribing to my channel for more content, follow my Medium publication for accompanying blogs and like my Facebook page for more bite-sized chunks of content. This has been Ankur, you've been watching Singularity Prosperity and I'll see you again soon! [Music]

Contents

Principles

Tim Berners-Lee outlined four principles of linked data in his "Linked Data" note of 2006,[2] paraphrased along the following lines:

  1. Use URIs to name (identify) things.
  2. Use HTTP URIs so that these things can be looked up (interpreted, "dereferenced").
  3. Provide useful information about what a name identifies when it's looked up, using open standards such as RDF, SPARQL, etc.
  4. Refer to other things using their HTTP URI-based names when publishing data on the Web.

Tim Berners-Lee gave a presentation on linked data at the TED 2009 conference.[3] In it, he restated the linked data principles as three "extremely simple" rules:

  1. All kinds of conceptual things, they have names now that start with HTTP.
  2. If I take one of these HTTP names and I look it up...I will get back some data in a standard format which is kind of useful data that somebody might like to know about that thing, about that event.
  3. When I get back that information it's not just got somebody's height and weight and when they were born, it's got relationships. And when it has relationships, whenever it expresses a relationship then the other thing that it's related to is given one of those names that starts with HTTP.

Components

Linked open data

Linked open data is linked data that is open data.[4][5][6] Tim Berners-Lee gives the clearest definition of linked open data in differentiation with linked data.

Linked Open Data (LOD) is Linked Data which is released under an open license, which does not impede its reuse for free.

— Tim Berners-Lee, Linked Data[2][7]

Large linked open data sets include DBpedia and Wikidata.

History

The term "linked open data" has been in use since at least February 2007, when the "Linking Open Data" mailing list[8] was created.[9] The mailing list was initially hosted by the SIMILE project[10] at the Massachusetts Institute of Technology.

Linking Open Data community project

The above diagram shows which Linking Open Data datasets are connected, as of August 2014.  This was produced by the Linked Open Data Cloud project, which was started in 2007.  Some sets may include copyrighted data which is freely available.[11]
The above diagram shows which Linking Open Data datasets are connected, as of August 2014. This was produced by the Linked Open Data Cloud project, which was started in 2007. Some sets may include copyrighted data which is freely available.[11]
The same diagram as above, but for February 2017, showing the growth in just two and a half years.
The same diagram as above, but for February 2017, showing the growth in just two and a half years.

The goal of the W3C Semantic Web Education and Outreach group's Linking Open Data community project is to extend the Web with a data commons by publishing various open datasets as RDF on the Web and by setting RDF links between data items from different data sources. In October 2007, datasets consisted of over two billion RDF triples, which were interlinked by over two million RDF links.[12][13] By September 2011 this had grown to 31 billion RDF triples, interlinked by around 504 million RDF links. A detailed statistical breakdown was published in 2014.[14]

European Union projects

There are a number of European Union projects[when defined as?] involving linked data. These include the linked open data around the clock (LATC) project,[15] the PlanetData project,[16] the DaPaaS (Data-and-Platform-as-a-Service) project,[17] and the Linked Open Data 2 (LOD2) project.[18][19][20] Data linking is one of the main goals of the EU Open Data Portal, which makes available thousands of datasets for anyone to reuse and link.

Datasets

  • DBpedia – a dataset containing extracted data from Wikipedia; it contains about 3.4 million concepts described by 1 billion triples, including abstracts in 11 different languages
  • FOAF – a dataset describing persons, their properties and relationships
  • GeoNames provides RDF descriptions of more than 7,500,000 geographical features worldwide.
  • UMBEL – a lightweight reference structure of 20,000 subject concept classes and their relationships derived from OpenCyc, which can act as binding classes to external data; also has links to 1.5 million named entities from DBpedia and YAGO
  • Wikidata – a collaboratively-created linked dataset that acts as central storage for the structured data of its Wikimedia Foundation sister projects

Dataset instance and class relationships

Clickable diagrams that show the individual datasets and their relationships within the DBpedia-spawned LOD cloud (as shown by the figures to the right) are available.[21][22]

See also

References

  1. ^ Bizer, Christian; Heath, Tom; Berners-Lee, Tim (2009). "Linked Data – The Story So Far" (PDF). International Journal on Semantic Web and Information Systems. 5 (3). doi:10.4018/jswis.2009081901.
  2. ^ a b c Tim Berners-Lee (2006-07-27). "Linked Data". Design Issues. W3C. Retrieved 2010-12-18.
  3. ^ "Tim Berners-Lee on the next Web".
  4. ^ "Frequently Asked Questions (FAQs) - Linked Data - Connect Distributed Data across the Web".
  5. ^ "COAR »   7 things you should know about…Linked Data".
  6. ^ "Linked Data Basics for Techies".
  7. ^ "5 Star Open Data".
  8. ^ "public-lod@w3.org Mail Archives".
  9. ^ "SweoIG/TaskForces/CommunityProjects/LinkingOpenData/NewsArchive".
  10. ^ "SIMILE Project - Mailing Lists".
  11. ^ Linking open data cloud diagram 2014, by Max Schmachtenberg, Christian Bizer, Anja Jentzsch and Richard Cyganiak. http://lod-cloud.net/
  12. ^ "SweoIG/TaskForces/CommunityProjects/LinkingOpenData - W3C Wiki". esw.w3.org. Retrieved 22 March 2018.
  13. ^ Fensel, Dieter; Facca, Federico Michele; Simperl, Elena; Ioan, Toma (2011). Semantic Web Services. Springer. p. 99. ISBN 3642191924.
  14. ^ Max. "State of the LOD Cloud". linkeddatacatalog.dws.informatik.uni-mannheim.de. Retrieved 22 March 2018.
  15. ^ "Linked open data around the clock (LATC)". latc-project.eu. Retrieved 22 March 2018.
  16. ^ "Welcome to PlanetData! - PlanetData". planet-data.eu. Retrieved 22 March 2018.
  17. ^ "DaPaaS". project.dapaas.eu. Retrieved 22 March 2018.
  18. ^ Linking Open Data 2 (LOD2)[permanent dead link]
  19. ^ "CORDIS FP7 ICT Projects – LOD2". European Commission. 2010-04-20.
  20. ^ "LOD2 Project Fact Sheet – Project Summary" (PDF). 2010-09-01. Archived from the original (PDF) on 2011-07-20. Retrieved 2010-12-18.
  21. ^ "Instance relationships amongst datasets". fu-berlin.de. Retrieved 22 March 2018.
  22. ^ "Class relationships amongst datasets". archive.org. Retrieved 22 March 2018.

Further reading

External links

This page was last edited on 20 August 2018, at 23:57
Basis of this page is in Wikipedia. Text is available under the CC BY-SA 3.0 Unported License. Non-text media are available under their specified licenses. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc. WIKI 2 is an independent company and has no affiliation with Wikimedia Foundation.