A Fun Program for Anyone With Too Much Time

2018-11-18 19:04:45

Originally posted to Reddit. See Thread
This was posted to a private subreddit. Only some users will be able to visit the original post.

I'm a night auditor, which means that during a normal audit I have about three hours of work total and like five hours of being trapped in a room with a computer and nothing to do. Our property is entering its off-season now so even the evening shifts I work (like right now) are very quiet.

The Question


I recently was contemplating the comments we put on our reservations and wondered what the most common words might be. My estimation was that the most-used would be "guest" and "reservation" because every comment is about one or the other. I wanted to find out.

The Process


Fortunately I am very fluent in Java and I know the administrative passwords for the work computers, so I put Eclipse and a JRE on our audit computer and set about writing a program to answer this question for me. I generated a report of all comments this calendar year, and our PMS really struggled to deliver it. When I exported to a .CSV file the report was 3.5 MB which is apparently asking way too much of our software, but I managed to get it saved to the local machine. After using Excel to trim all the noise, removing a lot of garbage that I didn't need, the file was about 750 KB and ready for processing.
I wrote up a quick little tool to parse .CSV cells and a quick-and-dirty program to count up all the different words and give me some numbers. After proving that the concept was solid, I set about improving the meaningfulness of the output data. I wrote a method which allows the program to treat plural and non-plural words as if they were the same word (so "kittens" and "kitten" are the same word) and another which does the same for conjugations ("meow", "meowing", "meowed", "meower", "meows").
I then set about making the program flexible. I can write the program to work perfectly for my reports but I know that (A) our PMS has a very small market share and (B) everyone uses different PMS softwares anyway, so I wanted to design the program to be flexible and configurable. All a user needs to do is convince their PMS to give them a .CSV dump* of their reservation comments, and then give my program's configuration file some very simple instructions on how to read that .CSV file and my program takes care of the rest.

The Solution


If anyone should care to toy around with it, I've made a small stand-alone program available on GitHub (see the README) to crunch some of that comment data for you. Your computer just needs to have Java installed; otherwise just follow the instructions on the GitHub project page.
Of course the complete source code is available too, if you feel like looking it over or messing with it. It's commented heavily and it's published under the GPL so you're welcome to go to town on it if you so desire.

The Results


Our PMS has three different kinds of comments:
  • CRS comments (immutable, generated by CRS, cannot be created locally)
  • Front Office comments (immutable, time-stamped, labelled as to who wrote them)
  • Reservation comments (can be modified or deleted, not time-stamped, not labelled as to who wrote them)
I crunched my Front Office and Reservation comments (3,622 and 4,469 respectively) and took a look at the results. I discovered, with no surprise, that I have written the most Front Office comments this year (1,067) with the second-place commenter having only 644. The longest comment of any type was 276 words (1,375 characters).

These are the top ten words for our Front Office comments:
  1. TO: 3384
  2. THE: 3321
  3. I: 2947
  4. GUESTS: 2217
  5. AND: 1954
  6. IN: 1897
  7. A: 1717
  8. FOR: 1432
  9. ROOM: 1357
  10. WAS: 1225
I was surprised, in a way - I was expecting more industry-related stuff. I would guess that on the whole these are English's most common words overall, which would be why they're also the most common here. So I've picked out the top eleven industry-related words:
  1. GUESTS: 2217
  2. ROOM: 1357
  3. CHECKED: 888
  4. CALLED: 631
  5. AUDIT: 615
  6. RESERVATION: 560
  7. NIGHT: 515
  8. CARD: 512
  9. RATE: 348
  10. FEE: 328
  11. STAY: 313
  12. EARLY: 313
I've included the top eleven here, rather than ten, because "EARLY" was the eleventh word and screw early checkins.
I considered including the whole dump in this post, but it contains a lot of proper names (breach of confidentiality) and words which would definitively identify our brand and location (violation of subreddit rules) so I'll just leave it with this.

Conclusion


I love programming and I spend a lot of time trapped in a room with a computer, so this happened. If you'd like any help with the program please let me know! Have fun.

* For Those Unfamiliar with CSV:


CSV stands for "Comma Separated Values" and is one of the best ways to handle bulk spreadsheet data. Any software that can export anything at all should (if it's worth its bits) be able to write a .CSV file.
CSV files are awesome because they're super-easy for humans and computers to read. It doesn't have any formatting information (like red cells with blue text) or any proprietary data (like all the extra junk Microsoft puts in Excel files). It's literally just all the stuff in the spreadsheet separated by commas. For this reason it's a massively flexible file format. It's super-easy to write new software to manipulate .CSV information, which is why this program came to exist.
When attempting to export data from a PMS, under "Export Format" or "File Type" or whatever of that nature, choose ".CSV" or "Comma-Separated" or "Comma-Delimited".

© Copyright 2019