MUXIANG Ruby Bark Sandblasted Poker with Saddle Stem Tobacco Pipe

This beautiful Ruby Bark Sandblasted Poker with Saddle Stem Tobacco Pipe is made from red sandblasted briar wood, with a saddle stem. The bowl has an oval shape and is beveled on the rim. The pipe is…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Predicting Hotel reviews ratings with Tensorflow

A traveler review might look like this:

Let’s see how we can build a DeepLearning model that helps us predicting the reviewers sentiment rates which should be an integer number in the rank [1–5] included.

Install Kaggle api if you dont already have it:

You can upload the json file with the following widget:

This allows you to upload the file from your local PC and then you need to run the following cell:

By now you should be able to use the kaggle CLI, so let’s download the necessary data with the following commands:

and then you should see some logs like:

Here we downloaded two datasets. trip-advisor-hotel-reviews.zip which contains the TripAdvisor reviews and glovetwitter100d.zip which is a dataset containing pre-trained Word Embeddings vectors.

In a few words, Word Embeddings is a representation of words used in NLP tasks which encodes words into vector spaces where similar words are represented by vectors with high cosine similarity (close to each other). They represent very well text semantics and word meanings.

Neural Networks can learn Word Embeddings and it’s pretty straightforward with Tensorflow and Keras, however it might take very long time to train and you may need a very large dataset to do so in a way that you get to capture rich language semantics. Fortunately many other people have already trained Word Vectors with very large text corpus in a way that we can easily load them and build a neural network on top of that. Today we will be doing transfer learning and we will use word vectors trained over twitter text.

Let’s see how many reviews we have for each rating value

As you can see the dataset is quite unbalanced which might have a negative impact in predictions accuracy but let’s continue to explore and see what we can get.

Digits don’t help much in building semantics for sentiment analysis, so let’s remove them

Let’s load the word vectors that we downloaded before. In this case i am taking word vectors of 100 dimensions built from twitter text corpus.

Run this cell if you want to compute autocorrections:

Let’s now add the word vectors for oov words

Now that we have preprocessed our data and we have computed word vectors, it’s time to build our Neural Network architecture

Am addressing this problem as a regression problem. We have user reviews and labels representing user ratings which can take integer values from 1 to 5.

After some experiments and trying different model architectures i noticed that many often even for a human it might be hard to predict whether a review rating is 4 or 5, or whether it’s 3 or 4 and so on. It depends on the reviewer’s sentiment which might not be quite well represented by the review text. So, assuming that a prediction in the rank [4…5] is a good prediction if the true label is either 4 or 5, let’s build a Deep Neural Net for regressing the rating values. The predicted label can be computed later by rounding the models outputs.

For this model am using Adam optimizer and Mean Squared Error (MSE) as loss function. Mean Squared Error will penalize predictions with large error. However am adding Mean Absolute Error (MAE) to the metrics, as this one gives us an idea of how good or bad is performing our model.

See that the total number of parameters is 5,065,013 while the total number of trainable params is : 109,313. This is because we freezed the parameters for the Word Embeddings layer, so that we don’t re-train them.

Let’s go on with model training but first let’s install a library to plot in real time the learning curve while the model is training. That’s helpful to understand if our model is doing well on training and to detect overfitting in real time.

Now we can pass an instance of PlotLossesKeras in the callbacks array.

The model is performing not really bad, we can see that there’s a little overfit because our training error is constantly decreasing while the validation error is oscillating around 0.5. After 60 training epochs we reach to a MAE of 0.312 in the training set and 0.491 in the validation set which is not bad.
Let’s see how it performs in the test set:

Notice that the accuracy score is quite low, let’s analyse the errors by building a confusion matrix and see what we have.

The highest confusion happens between ratings 4 and 5, having 144 and 62 classification errors respectively. We can see later confusions between ratings 3 and 4 with 35 and 31 errors respectively and 40 errors in confusions between ratings 1 and 2. This might be a direct consequence of the unbalanced dataset we talked about previously. However a reviewer’s score might shift around a fixed value and we might be able to predict this target value with low error but there will always be a hard predictable random variation due to reviewer’s sentiment that we cannot catch from text.

Let’s define the measure: Human Level Error as the error rate that a human would have if we asked him to classify the reviews dataset. For example try to classify yourself one of the entries for which the Neural network gave a wrong prediction:

“hotel treated like family traveling stayed days , tours planned days , hotel staff friendly helpful , pointed things arranged day trips pisa hiking countryside , said museums florence n’t think able reservations weeks ahead time , came told david accademia opened : arranged tour uffizi , anytime needed help suggestions eat directions places , come hotel greet big smile name.they said family treated , traveling wonderful bonus , traveled week returned florence hotel europa greeted like long lost relative , absolutely wonderful feel , hotel located near duomo easily locating duomo walking / blocks hotel , loved location duomo window , huge room loved time spent , experienced feeling home hotel , great ,

You would probably give a 5 to this review, as i did myself, however it’s targeted with a score of 4. Assuming that humans could make mistakes classifying this dataset involving the most frequent errors of the previous confusion matrix, we could say that the model is not really bad as it reaches human level error which is a good measure to evaluate Machine Learning models.

I think that even more improvements might be made to reach even lower MAE, but as this is a just-for-fun project, am stopping the analysis right here. I invite you to continue on the experiment or try a completely different approach and get in touch if you want to share your results.

I hope you enjoyed and stay tuned for more Machine Learning and coding.

Add a comment

Related posts:

Por las infancias y adolescencias

Tras un intercambio con adolescentes mediante talleres se compartieron experiencias de este grupo etario en torno al conocimiento de sus derechos y sobre temas de salud y sexualidad teniendo en…

Ironi Masyarakat yang Suka Berbicara tentang Moral dan Keegoisan

Kita tahu bahwa adat ketimuran (dan suku-suku yang ada di Indonesia) menjunjung tinggi yang namanya moralitas. Moralitas dalam agama dan bermasyarakat sangat kental di manapun. Ajaran agama dalam…