International Area Studies Review

*International Area Studies Review*
Publication details
Discipline	Politics
Language	English
Edited by	Scott Gates
History	1997-present
Publisher	SAGE Publications
Frequency	Quarterly
Impact factor	0.7 (2022)
Standard abbreviations
ISO 4	Int. Area Stud. Rev.
Indexing
ISSN	2233-8659 (print) 2049-1123 (web)
OCLC no.	841151237
Links
Journal homepage Online access Online archive

The International Area Studies Review is a peer-reviewed academic journal published by SAGE Publications on behalf of the Center for International Area Studies (Hankuk University of Foreign Studies) and the Peace Research Institute Oslo. It covers all aspects of international area studies. The editor-in-chief is Scott Gates (Peace Research Institute Oslo). It was established in 1997 and is published by SAGE Publications.

YouTube Encyclopedic

1/3
Views:
12 086
687
2 026

Transcription

>> LIN: Okay, hi. My name is Jimmy Lin. I work here at Google and I'm happy to introduce Professor Alice Oh. She's an assistant professor of Computer Science at the Korea Advanced Institute of Science and Technology. Professor Oh's group does research in natural language processing, machine learning and human computer interaction. And today, she'll be talking about the work that her group is doing on Topic Models related to Online News and Reviews. >> OH: Okay. I'm very glad to be here and thank you, Jimmy, for hosting this talk. So, today I'm going to be talking about Topic Models and how we apply that to online reviews and online news. And in doing so, I'm going to be talking about Topic Models a little bit just to make sure that everybody knows what Topic Models are, so it'll be a brief introduction to it. And then I'll go into the details of our work. These are sort of two mini-talks. The topics are--the second and the third items are related but they're not quite the same. And these are from two papers that my students and I recently submitted to Wisdom next year. So I keep my fingers crossed. Okay, so let's just dive into the main problem that we're going to be talking about. And, recently Google Books has announced this, right? I'm sure many of you read about this somewhere that Google Books has counted at least 130 million books out there ever written; and I'm sure there are more. So what we can see, and I'm probably preaching to the choir here, is that there is a lot of text data out there to be understood, right? So if you have 130 million books, we can ask this question, it's a very simple question when you just look at it, "What are the books about?" But it's a very challenging problem, right? So if you know anything about text processing you'll agree with me that if we have 130 million books and we're trying to figure out what actually is in those books that's a very difficult question. Topic Models is one answer, one approach to getting at that answer, okay? So the plate diagram up there is what you normally see when we talk about Topic Models so I just put it up there. We'll get back to the plate diagram in a little bit. But Topic Models, the main purpose of them is to understand--sorry, to understand and uncover the underlying semantic structure of text of your corpus, okay? So let's look at an example of what a Topic Model could do for you, right? So this is an article from the New York Times a couple days ago, a few days ago, and the title is "Economic Slowdown Catches Up With NASCAR." And as you can see from the headlines it's talking about the NASCAR car racing and it's also talking about the economic recession. And what Topic Models do for you is it kind of discovers, it uncovers what the latent topics are in an article or in a corpus, okay? So you can see in these three colors here, green, orange and a little, I guess, purplish pink color, the three topics; three of the main topics you can see from the article. So the green one is about NASCAR races and you can see sort of throughout the document I've highlighted the words that are about that topic, okay? So, NASCAR races, track, raceway, cars, et cetera. And then, the same thing with orange, is about economic recession, so you would see--or it's like sales, costs and so on. And the purple is the general sports topics. So, [INDISTINCT] Topic Models and the--a Topic Model would have, it--every document is made up of multiple topics and the title words in the document are generated from those multiple topics, okay? So LDA, the Latent Dirichlet Allocation, is one of the simplest Topic Models and it's very widely used, so--and it's a generative model, which means that it tries to mimic what the writing process is, right? So it tries to generate a document given the topics, okay? So let's that's how that works. So bear with me if you are experts on LDA. I'm sure some of you are. So, here again we start with the three topics, the NASCAR races, economic recession, and the general sports topic. And when you have those topics and notice the topics are made up of words--and I'm just showing you a subset of the words that have high probabilities in that topic. But actually the topics are multinomials over the entire vocabulary. So, the NASCAR race topic, it has--it gives high probabilities to those words but there other words in that topic and they have small probabilities. Okay, so when you have these multinomials over words, when you want to, say, generate three documents from these three topics, what you would do is kind of produce or kind of guess these topic distributions of the documents that you're trying to generate. So, for example, the middle--the one in the middle, the writer is thinking, you know, "I'm going to write mostly about the general sports topic and then I'll talk a little bit about maybe some of the other topics," okay? That's what the big purple bar means. Okay, so when you have those topic distributions, then what you can do is--oops, I'm going backwards. Okay, so from, say, the bottom topic distribution, you can generate the words according to that distribution, okay? So, since we have a lot of the green, you would see many green words popping up there, okay? And then the same thing you would do for the other documents. So that's the generative process of an LDA. Okay, so let's look at it from the plate diagram perspective. So here up on the left is the general--is the widely used plate diagram for this. And what you can see here is the phi's, okay, next to the beta there, are the topics, which are the multinomials over the vocabulary. And then up there up to the right corner over there are the thetas which are the topic distributions, okay? And then from those topic distributions if you want to generate one document you would generate sort of the set of rectangles I have over there which are the words or which are the topics of the words that you're going to write in your document, okay? So in this example, I'm looking at the first topic distribution and I've sort of picked out the topics that I want to write about. And then, when you have those topics then you can look up the multinomial topics over here to generate, to actually come up with the words, okay? According--so, for example, the first word you're going to write is in orange, so you're going to come here and look at the orange topic and say, "Okay, these words have high probabilities in these topics I'm going to pick one of those words," okay? So that's what you get in the actual--the rectangle down below, where you actually have the words in your document, okay, and that's how the document is generated. Okay, but in reality, what you have is the--you only observe the words. So these are the documents in your corpus, okay? So, where this is one document in your corpus and all others like the topic distributions, the topic themselves, they're latent. We don't know them. But the purpose of fitting the model then is to come up with those topic distributions and the topics, okay? So they--all those others must be discovered by the model. Okay, so that's what you do when you, you know, fit an LDA. Okay, so what does an output of an LDA look like? They look something like this. There are some other outputs that LDA gives you but one of the major outputs of LDA is these multinomials over words which are the topics. So the NASCAR topic, for example, has those words with high probabilities. What I put at the very bottom row is to let you know that actually every topic has every word in the vocabulary. It's just that some like the money word here in the NASCAR topic has very low probability of it. In actuality it's probably much lower than that, okay? So that's what the topics look like. So if we go back to the question I posed earlier, if you have 130 million books and you want to answer the question, "What are the books about?" then you can imagine you can sort of feed these books into an LDA to discover topics, right? So if you represent one book as one document and you run it over 130 million of them, you can discover the underlying topics, the underlying semantic structure of your corpus. Let's look at a smaller problem since it's very hard to run LDA on 130 million documents. But if we have news articles and we have about 200,000 of them over the last 12 months then we can ask the question, "What are the news articles about?" And this is something that we can try to solve. And one difference here is that time is a very important dimension here, right? Because news is inherently sequential and temporal and you want to know what happened when and how long did it last and so on, okay? So we need something that considers time. Okay, so we proposed what we call Topic Chains, which is--has the main purpose of uncovering the underlying semantic structure of a sequential corpus of news, okay? And this is work that my student did mostly and so if you have any detailed questions about it you can send them an email, although I'll try to answer most of them. By the way, if you have any questions, feel free to raise your hand and ask. Okay, so let's look at what we have, what kind of tools that we have available to us now. So if you look at the New York Times, if you kind of scrolled down to the bottom half of the page, this is what you get. You get a quick news-at-a-glance type of a thing, right? So you can look at the New York Times, the front page of that website and kind of figure out what's been going on in the last couple of days, okay, in terms of technology, the world, business, arts and so on. And this is a very good view and I love to look at it. But this, as you can imagine, takes a lot of intelligence and a lot of work, right? So this is a product of, you know, intelligent New York Times editors out there who are trying--who are putting this together. And plus, it doesn't have the dimension of time because this is a snapshot of the news, right? So here, at Google, somebody has made this really interesting tool it's called Google News Timeline. It's still in the Google Labs, so you may not--may or may not know. But this is where you can look at the sequential issues and events, right? So, right now, it's showing the monthly view. So you can see what was the more--most important news in March of 2009 and so on. And you can search, too. So if you search for a certain keyword then you would get articles that are about that keyword, right, and you can look at the weekly view and the yearly view as well, I think, and the daily, of course. So this is pretty cool but I think that here we still have some questions that are unresolved. For example, if you have an article, say, there's an important article in March of 2009, are there similar articles that follow that are talking about the same thing in April of 2009, perhaps a couple or a few months later, right? And if there are similar articles talking about the same topic over a long period of time, how long is that period of time? How long did that topic last, right? And if it's a long-lasting topic, then is it part of a general sort of professional topic like the U.S. economy or was it part of a long, running sort of event or issue such as the H1N1 issue, right? Or was it part of a very short, temporary topic such as the death of Michael Jackson, okay? So we would like to know those things when we look at the articles, but at least what we saw in the previous slide didn't really show that. And if it's a general, sort of long-running topic like the H1N1, for example, the topic itself kind of evolves through the nine months or how many--however many months that it lasts, okay? First, it was talking about the outbreak, perhaps, maybe it was talking about travel restrictions and then vaccinations that's--schools and so on, okay? So we would like to see how the same topic evolves through time. So, what we propose is something like this. This is a part of our results, is that you can look at several months of news and kind of look at the topics and how they're clustered together in what we call Topic Chains. So you--here you see that there's a Topic Chain about labor unions, education, the War in Afghanistan, the swine flu and so on. And then you would see some events like there was a terror in Hong Kong or something like that and the death of Michael Jackson. So we produced something like this where you can see the general perpetual topics, you can see the long-running "It" topics and then you can see sort of the temporary events that happened. So, this is what we call the Topic Chains and this is the plan that we had to do something like that, right? So what we did is we took a bunch of articles over a bunch of months and we divide the corpus into time slices and we just chose time slice of 10 days each, okay? And for each time slice you would have a bunch of articles, right, and we can find the topics using just the simple LDA. And when you have the topics from the LDA then you can try to match them up to see which topics are similar, okay? And when you have the similar topics, you can sort of link them up into Topic Chains. And once you--yeah? >> At what similarity are topics based on? >> OH: Yes, so I'll talk about similarity metrics in the next slide, I think, or a couple of slides. And then once we have the Topic Chains we can identify which are the long topics, which are the short topics, and within the long topics we can sort of see what the topic evolution looks like. Okay, so let me talk now about sort of each of those steps; except for the first one because that one is trivial. So we worked with the corpus of nine months of news in Korea. So we took the websites of three major newspapers and collected documents and articles from all of them. The corpus looks like that, 130,000 articles, 140,000 unique words, named entities, and we chose 50 as the top--the number of topics per each time slice for a total of 1,400 topics. And let me just show you the results of the LDA; finding topics using LDA. And there's, since there's 1,400 of them, I can't really show you them all, but I'm just showing you four examples of how the topics turned out pretty good. The first one you can see is about sports. The second one is about business and then about smartphones and technology. And the last thing, the last topic is about Academia. So when we have those topics, we can construct Topic Chains like this where you look for similar topics within certain window size. And we also--I'll show you an experiment that we did with increasing or decreasing the window size and what happens there. But what it means is do you look at only the time slice before or how many time slices do you go back to find the similar topics? Okay, so here comes the answer to that question, measuring similarity. This is kind of an important issue because the major thing about Topic Chains is that we're finding similar topics, right? So, remember, the topics look like those, which are multinomials over words. And so you can imagine various ways to measure similarity and within the Topping modeling research community people have used most of these metrics. Most notably, they usually use KL divergence or cosine similar--cosine, yeah, cosine similarity. And I've kind of categorized these six--or is it six or five--six similarity metrics by how each metric looks at the topics, okay? So the first thing, you can look at--or I said that our topic is multinomial over the vocabulary, right? So if you have two probability distributions and you want to measure the distance, then KL divergence is the answer, right, or JS divergence, which is the symmetric version of KL divergence, okay? Or you can look at a topic as a vector where each dimension is a probability of the word in the topic. So if you take that view then you can use cosine similarity because to measure the distance between two vectors, right? Or you can use Kendall's Tau if you look at a topic as a list of ranked words, a ranked list of words. So if we just ignore the probabilities but just look at, you know, NASCAR is the first rank and so on then we can use Kendall's Tau or DCG which is used I guess a lot in information retrieval. And then lastly, if you look at only the subset of words that have top--high probabilities, then we can look at the intersection and unions of sets, which we can measure with Jaccard's coefficient. So we wanted to test these metrics to see which would be the most or the best performing similarity metric, okay? So what we did is this, we computed the log likelihood of data of the corpus, given the topics that LDA found. What that means is if you have a small negative log likelihood of data given the topics, then that means your topics are explaining your corpus very well, okay? So the higher the value, there's sort of a mismatch between your topics and your corpus, okay? So it's kind of like perplexity, too. So what we did is we took an original set of 50 topics that LDA found for each--for one time slice and replaced five of those topics with similar topics that are found by each of the metrics, each of the six metrics, okay? So, for example, you know, if KL divergence says, among these 50 topics and then another set of 50 topics in the next time slice these five or the most similar pairs, then we replace the topics from the second time slice and kind of put them in the first time slice. So you would have the 45 of the original topics plus five new ones that KL says is most similar or most similar, okay? So then, when we compute the log likelihood of the modified topics or the log likelihood of the data given the modified set of topics, then we can see which of the similarity metrics found the most similar topics, okay? So, as I said before, KL divergence and cosine similarity are most often used similarity metrics and we found that JS divergence actually performs a little bit better. And the asterisks next to the metrics mean that there is a significant difference, statistically significant difference, between that metric and JS divergence. And Jaccard's coefficient performs pretty well too, but we didn't use that because you have to have this parameter. There's a parameter that we had to set and we thought that that's probably not as general as just using JS divergence with no parameter. So, that's what we chose to use as our similarity metric for constructing the Topic Chains. And let's now talk about the size of the window, the size of the sliding window. So if we can take the [INDISTINCT] assumption and just look at one time slice to find a similar topics but then you wouldn't find that case of, sort of the long arrow over there, where a topic was kind of an important issue for awhile and then it kind of disappears for a few weeks and then comes back again. So, we didn't want to miss that similarity chain there. So, this is what we did as the experiment to see how the sliding window size affects the resulting Topic Chains. So, up at the very top, is a set of Topic Chains found when we use the sliding window of size one looking back just one time cycle, and then at the bottom is the sliding window of size six. So, it's kind of an obvious result, but you can see when you're looking back only one time slice then the Topic Chains are kind of fragmented and they're kind of dispersed all over the place. But as you increase the window size, then the Topic Chains over there, that had sort of a little gap at the middle--in the middle and then continued a few weeks later, they kind of merged together, right. So the Topic Chains has become larger, longer, and you would find these pretty large Topic Chains at the bottom. What's interesting is, if you look the middle one, where there are two major ones and then they come together at the size of five, those Topic Chains are about technology and business. So one of them is about the technology itself, sort of manufacturing, research, development, type of topics and then the other one is sort of the business side of the technology, okay? So, you can see that they kind of merge together at the window size of five, and it's kind of hard to interpret that in terms of what it means for the user, right? So if the user wants those to be kind of separately separate then that's probably what we should do. But if you want them to be sort of in this same Topic Chain then you might want to go with the larger window size. But, in general, as you increase the size of the sliding window, the Topic Chains tend to become more abstract. So at the end, you would have something that's similar to like the sections in your newspaper, right? So, so business, life and, you know, culture, and, you know, the world news, and so on, right; whereas sort of in the middle you would have sort of more concrete topics. Okay, so that's what that shows. Let's look a little bit closer at the chains themselves, what they mean. So if we look at the long chains, for example, the swine flu chain, right, you want to know more than just that there was this big chain of swine flu and we can see that, you know, in 2009 it was kind of a big issue for most of the year, we want to know how that topic actually changed, okay? So as I talked about before, first it was talking about the outbreak and then vaccinations and so on, right? So we call those focus shifts. So within a Topic Chain, we can look at how the focus shifts in the chain. And I apologize for the small font. It's really hard to see. But this Topic Chain is about automobile industry, okay? So, I'll just read you--on topic number one has the tab words automobiles, Vietnam, Kia Motors, vehicle and sales. And topic number three, which is right below that, is develop, technology, automobile, investment, and industry, okay? And then the other topics are pretty similar to that. So you really can't tell what is going on just by looking at the topics themselves, okay? So what we wanted to do is look at words that changed the most between two similar topics. So, what happened between this topic and the next topic? And if we look at the words that are not common but are most different among the two topics, then you can sort of figure out what's been going on. And on top of that, we looked at just the named entities. Named entities are things like names of organizations, names of people, sort of specific things like that, because a lot of the news, a lot of the events that happened in the news are about specific people or organizations and so on, okay? So coming--going from number one to number three, again, when we do that the--excuse me--the named entities that we find are green, solar, Japan, energy, and what's the last one--and carbon, yeah. So that tells you that there was something going on in the second--so those are the words that changed--that increased the most in probability from topic number one to topic number three. And we--if we just go back and look at the headlines, you see that there was--there were a few headlines that are talking about Japanese carmakers like Toyota coming up with solar powered cars, okay? So, if we look at this sort of closed--close up view of the Topic Chains and the named entities that changed, then we can have a much deeper understanding of the evolution of the topics. Okay, now, let's look at the short chains and this is pretty interesting. Here, every line is a topic and the left column is the date. So, 0P 07 means the first 10 days of July in 2009, there was--there was a missile launched. There was a discussion over the North Korea missile launch. And then the next line talks about the death of Michael Jackson and then some milk scandal and then some heightened--a topic about heightened security at the end of the year and then some romance over, you know, entertainment people. In April of--April is when Korea has Arbor Day, so talking about trees and stuff like that. And then the last topic is kind of interesting. Obama, Republicans, Jeju Island is an island in Korea that's used for resorts and playing golf and you see golf and Tiger Woods. But, I don't know if Obama went there to play golf or not, with Tiger Woods maybe, but what we can interpret that is--that LDA found a topic that's kind of not about a single topic and LDA often does that. If you've ever run LDA or any other Topic Models--oops--you'll find many of--or some of the results, some of the topics that you find are not really coherent. So, anyway, these are short Topic Chains which means they're like two or three topics or even one, two, or three topics, and they represent mostly temporal events, temporal issues or they could be about incoherent topics. And you can kind of see how, if it's a coherent topic, then it would more easily find similar topics in the next time windows, right? Okay, so that's actually the end of the Topic Chains part of the talk. How am I doing on time? Okay. I'll go quickly over the next topic. So, we propose Topic Chains, which is a framework based on very simple LDA to understand what's going on in the news corpus. Okay, now, let me switch gears and talk about sentiment and aspects and reviews. So, the model is called Aspect Sentiment Unification Model and its main purpose is to uncover the structure of aspects and sentiments in a review, okay? And this is another student of mine who worked on this mostly. And the promise is this, if you go to Amazon--this is a review of a digital camera. It's a very long review. It's like a--it's like a conference paper almost and this is actually not the end, there's more. But it's a very, you know, detailed review. He talks about or this user talks about a lot of good things and bad things about this camera and we want to do something like this, right? Amazon does sort of aspect or attribute based sentiment analysis of the review. So, in addition to the general, how many stars did this camera get? It also gives you how many stars for the picture quality and so on. The way Amazon does it--I don't know exactly how they do it, but I noticed that--so this is a camera with lots and lots of reviews like 300 reviews or so. And then there's--the same Canon digital camera, certain other models which have very few reviews, and for those we actually don't have these attributes. So it looks like there's some manual work and some automated way of looking at what the attributes are. And we call those attributes aspects, and there are things like this; this thing is small and it's light, starts up and turns off fast, the low light performance is best. And so these are actual sentences from the reviews. And the sentiment is something like this. The words highlighted in pink are the ones that carry sentiment for this--for each sentence. Okay. So let's look a little bit closely at what these sentiment words are, okay? Some of them are general sort of effective words that express emotion like love; "I love this," "I'm satisfied," "I'm dissatisfied," "I'm disappointed," okay? And then, some of the other ones are general sentiment words like, "Best, excellent, bad." There--they evaluate the quality of something, but they're just general. If something is best, then it's best no matter if it's a coffeemaker or a chair, right? And then there are aspect specific evaluative words. And this is a little more fine-grained than domain specific evaluative words. So let me show you what I mean. In the camera domain, okay, if you say, "This camera is small," it's probably a good thing. "The LCD is small," it's probably a bad thing, right? If you're in the restaurant domain, "The beer was cold," is good. "Pizza was cold" is bad. And, "The wine list is long," is good and "The wait is long," is bad. So, beyond the domain, right, we need to go sort of down to each aspect of the review and say whether the sentiment word there expresses positive or negative sentiment. Okay. So this is the problem that we're trying to solve. Okay, we're trying to discover the aspects automatically as well as the sentiment and the words that carry the sentiment. So to do that we made two models, one is called "Sentence LDA," the other is called "Aspect Sentiment Unification Model." And we worked with two types of corpura. The first one was Amazon reviews and we took seven product categories, including digital cameras, coffeemakers, I think, heaters and things like that. They're just pretty different electronic products--oops. And we also looked at the Yelp restaurant reviews over four cities and 328 restaurants. And on average, each review had 12 sentences. And our observation starts again with the same set of sentences. What we noticed here is that for many of the sentences in the reviews, one sentence describes only one aspect, okay? And this is different from the general LDA assumption which is that each word in the corpus, each word in the document, represents or is generated from one topic or if you applied it to aspects, one aspect, okay? So we wanted to make this sentence LDA. If you noticed, the only difference is the box around the W circle, okay? So what that means is the words--and N is the number of words in your documents, so each word is generated--but Z, which are the topics, are over M, which is the number of sentences, okay? So we're saying there are only M aspects in that document, which is the number of sentences in that review, okay? And each sentence has one topic or one aspect, okay? So that's the basic difference between LDA and SLDA. And what we found is that when we run SLDA over our data--so this is--oh, this--yeah, the results from both the Amazon reviews, and the last one actually is from the restaurant review. So, remember, we ran SLDA over all sort of seven categories of electronics reviews and we get these aspects. They are similar to what we saw earlier in the Amazon attribute categories, okay? So the portability, quality of photo and ease of use, those are the three--and the camera product. And then if you look at the laptop reviews the first one is about software and OS and then the second one is about hardware and so on. And some--and what we found when we compared the results of LDA versus SLDA is that LDA or SLDA was finding more product specific aspects, okay? For example, the last one, liquors category, liquors topic or aspect was not found by LDA. Instead, the words like beer and wine and martini was actually one of the top words, too. They were kind of spread out over different topics, like wine was maybe with the Italian food aspect and so on. So, I think it's important to notice that SLDA, because of that one difference in the assumption it makes, finds better product specific aspects, details of the reviews. Okay, we then took SLDA and extended it to form a joint model over aspects and sentiments, okay? So the right side of the model, which has the gamma, the pi, and the S, so S is the sentiment and you can see that word is now generated from a pair of sentiment and aspect, okay? So, with this joint model then, if we run it over the corpus, without using the labels, any labels of the corpus, of the documents, we can automatically discover the aspects and the sentiments, okay? But we do use seed words. We took Turney's Paradigm words because they're kind of generic paradigm sort of sentiment words that a lot of people use, like good and nice, and bad and nasty and then--so that was one set of seed words we used. We also augmented the paradigm words a little bit with other sort of general sentiment words that we found from the corpus, okay? So--and what we do with this--with these sentiment words that are a little bit different from other prior work in this joint modeling of sentiment and aspect is that we build the seed words right into the model by playing with the priors of the LDA, okay? So setting asymmetric priors and initializing Gibbs sampling, which is an inference algorithm to kind of play with the seed words, okay? And I'll--let me explain that a little bit better here, although I didn't even talk about Gibbs sampling, so if you want to explain that, we can talk later after the talk. So beta is the prior for the Dirichlet Distribution over the phi's, okay? And what that means is, do we start with a uniform distribution of betas, which means that every distribution is equally likely? If we play with the betas and do asymmetric priors, then we're saying some of the distributions are more likely than others. Okay, so what we do--what we did with SLDA is we just used the uniform priors. What we do with the betas here is we set zero--beta to zero, for any negative sentiment seed words in the other, in the opposite--in the positive sentiments, okay? That means--and we do vice-versa for the negative--for the positive sentiment seed words, okay? That means if you have a positive seed word like "good," then it's not going to be assigned a non-zero probability in a negative sentiment--negative aspect sentiment, okay? And also, we start Gibbs sampling, we initialize a Gibbs sampling by setting the positive seed words to have positive sentiment and the negative seed words to have negative sentiment. So that's opposed to randomly assigning sentiment, which is what we usually do for Gibbs sampling, okay? So the combination of those two makes the seed words kind of right into the model without fidgeting anymore with the words themselves. Okay. So these are senti aspects discovered by ASUM, as we call the model, and these--so, every multinomial now is--so every word in the multinomial is generated either by the sentiment or by the aspect or they're actually jointly by the pair sentiment and aspect, okay? So interesting results here, like the meat senti aspects here--meat positive and meat negative, the meat aspect was not found by SLDA. So what this tells you is that for some senti aspects, if there's strong sort of sentiment correlated with that aspect, then it comes out better with the ASUM model than it does for just SLDA without sentiments built-in. So in original SLDA, what happens is the meat aspect is kind of scattered around again in, you know, in pizza, in burger steak, right? Those aspects have meat words in them. But because we've forced kind of the sentiment to play a bigger role in finding the aspects, we see aspects like that. And an interesting case that we see with payment is that we only get a negative aspect for payment. We don't get a positive senti aspect for payment. It's the same thing with parking, too. What that tells us, just an interesting bit, is that people complain about payment not being able to use their credit card or they complain about parking situations. But if they have some satisfaction with it, they don't really write it in the reviews. Okay. And the yummy aspect is kind of funny; too, with the last word is funny, right? So that aspect is something that LDA doesn't really find. Okay. So let me go on. So what we can do with these topics, okay, the words in the topics is then we can try to figure out which are the sentiment words and which are the aspect words, right? So if we have the two meat senti aspect words, we can look at the words that appear common across the two sentiments like meat and--I don't know what else, sauce, I think, and we say, "Those are the common sort of aspect words for the meat category," okay? Whereas, things like "crispy" or "blend" are the sentiment carrying words for that aspect, okay? So those are the aspect specific sentiment words. And so, what we do is we align the senti aspects with the similar aspects again and then we look at the positive aspects, the negative aspects, and we look at the common words and the words that have a lot of difference in them to figure out things like this. The screen aspect, the words, the common words are like screen, glossy LCD and then the sentiment words are like bright, clear. Those are the positive words and, like, reflect, glare--MacBook, obviously, it doesn't have a good screen. So--or apparently. I don't agree with that personally, but anyway, so that's what we can do with those topics. Okay, let's look at some other results of this. So here is a result that shows you that sentiment classification per sentence is done pretty well. So we--so these are two reviews. The first one is about a coffeemaker and then the second one is about a restaurant. And you can see in green, those are the positive sentiment sentences. So that's what the model found as the positive sentences and the ones in pink are the--where the model found them to be negative. And, of course, I'm going to show you good examples. But most of the examples are pretty good, okay? So, another set of results we can do--we can look at is, how well are the aspects assigned to the sentences, right? So these are four different reviews where the same aspect was found. The aspect--the senti aspect of parking and the negative sentiment, and you can see that parking is only validated for three hours and so on. So those are--and these came out pretty well, right? Here's another example. Some of these things like very convenient, how the model find--found that to be coffeemaker easy? It could be some--I don't know. It's probably just because convenient is up there as one of the top probability words for that senti aspect. Some of the other shorter sentences this model has trouble with because there's not enough clue, okay? But--and I do like to show some of the bad examples as well. So the second one, it took us several uses to understand how much coffee to use. That's obviously not a positive sentiment, but the model classified it as that. But, you know, one out of five is not bad, right? Okay, so those are the senti aspects assigned to sentences that we can see. Oh, and the last one, I put in there to show you that our assumption that one sentence carries one aspect may not always be true, right? So the last sentence is talking about how nice it looks and how easy it is to use. But you can kind of say, you know, "Are they the same thing or not?" I don't think they're really the same thing. Nice looking is probably--or it should be another aspect, sort of the design aspect of the product and then the ease of use is the usability of the product, right? So we do see a lot of sentences in our corpus that do not validate our assumption, that of one sentence equals one aspect or even one sentence is one sentiment. But--so that's future work for us, right, to deal with sentences like that. Okay, so all of these that I've--the results that I've showed you, because of the way Topic Models, you know, they produce these topics, it's really hard to evaluate them. There's really no good way to quantitatively evaluate the aspect. So we can't ask users to go through 20,000 reviews and find all the aspects and kind of compare them against our results, right, or the sentiments. So the sentiment would be a little bit easier to do. So what we do with the sentiment actually, is we quantitatively measured how sentiment classification is done against other generative models that jointly model sentiment and aspect together. And so, this is--I have to tell you, though, so these--our model as well these other models, JST and TSM are the two models that we're comparing against, they're all not designed for sentiment classification per se. So they're all trying to discover aspects and sentiments together and come up with these sentiment words and so on. They're not, you know, models to do classification and neither is ours. But we put this experiment in there to show at least that sentiments are found well, okay? So let me explain the different things. ASUM, the blue, is our model with the regular paradigm words. I think there's like a dozen of the paradigm words that I showed you. ASUM Plus is the paradigm plus words, the augmented list of words, and then GST Plus and TSM Plus we also--they also use seed words, so we use the same set of paradigm plus words. And we implemented those two models and rank classification over our own corpus to see how they--how well they perform. So you can see the red line, which is ASUM Plus, performs the best. The next one is the blue with--so, ASUM without the paradigm plus words and then the other models don't perform as well on the classification task. So, just to tell you, these models are pretty similar to ours. They both don't have the one sentence, one aspect assumption built-in. They don't use the seed words right into the model they do something else with the seed words. So those are sort of the main differences I would say between those two models and our model. Okay, so let me just wrap up. I think time is probably up, too. I just talked about SLDA and ASUM, which are the two models, extensions of the basic LDA to discover sentiment and aspect together. And we discovered that specific aspects that we found were pretty well aligned with the details of the reviews that people actually write and we can, by looking at the topics and the words within the topics, we can learn aspect-specific sentiment words. And lastly, we just tested with sentiment classification and found that it performs pretty well, okay? So just to wrap up now really, Topic Chains and ASUM are the two things that I've talked about and they both work with LDA, right? So they're on different domains, one is on the news domain and then the other one is in the reviews domain. We're trying to do the same thing. We're trying to uncover what is latent, what is the hidden semantic structure within the news corpus and within the reviews corpus. So if you're interested in our work, further discussions with me and my students, please send us email or you can look at our website to see what the latest things are going on. Okay. Thank you. Questions? >> Could you comment on what you discovered when you ran these two books? Like you were mentioning the... >> OH: Yeah. >> ...130 million books. These books are a lot different than... >> OH: Yeah, yeah. So we didn't--we haven't done that. >> You haven't done that? Oh, okay. >> OH: Yeah. Well, I should ask Google Books to do that for me. I don't--I can't imagine what the results would be. One thing about LDA, though is that--and Topic Models in general, is that they are very computationally expensive. As you can imagine, right? If--as the document size grows large, the number of vocabulary--unique vocabulary grows large and you're doing Gibbs sampling over all your vocabulary at each iteration, and you have to do thousands of iterations to converge. So the inference part is difficult. And we would like to maybe use Google and, you know, use distributed computing and all that to figure out how to do that. Oh, question? >> So the aspects you found usually it seems not to the ones defined by users, like the camera. Is there any way you can specify, "Okay, I like to find those aspects specified by users?" >> OH: Yeah, that's a good question. So we are thinking about it. We haven't done anything. I don't know how to do it. Sort of like the sentiment seed words you can have maybe seed words for aspects too, to say, "I want to find these aspects." Good question and a good idea for an extension of this work. Okay. Thank you, again.

Abstracting and indexing

The journal is abstracted and indexed in EBSCOhost, ProQuest, and Scopus.

External links

Official website

This page was last edited on 30 August 2023, at 14:22

From Wikipedia, the free encyclopedia

YouTube Encyclopedic

Transcription

Abstracting and indexing

External links