Those are not faked (read: photoshopped) – that is how much I paid for those items sometime in 2014. Those are really good deals for those items, especially in Canadian Dollars. Most of those retailed for about over $100 CAD over here.
These are all known as price error items as the merchant did not intend to sell them for this much. However, sometimes logistics software screws up and there’s a community of people who want to scoop up these things.
They actually come up quite a bit:
.. at least a few times a week in most cases. Of course, not all items are of interest.
The process of ordering something that was priced in error
To get in on this, you need to…
You need a quick reaction time as these are often fixed within an hour, sometimes even less than that. There’s other factors as well that affect the quality. In Canada, if you can do in an in-store reservation at that price and then go in to get it the same day – you have better odds than having it shipped, since orders are sometimes canceled on-line but fulfillment rates are high in store.
Since short of reading every thread frequently is tedious, we need automation. Some tools are available already off the shelf to do this which we will dive into first to motivate building something new.
The manual process of finding some of these
Generally, there are a few ways of doing this:
No single strategy is going to get you 100% of the way there and some services provide automated ways of doing this already for you to increase your chances. I was a frequent user of method #2 for a while but it takes up a lot of time, since often they are the best sources of some of the more obscure deals and involve reading pretty much every thread. Wouldn’t it be nice if we could just get a notification when someone posts something of interest?
Slickdeals has something to do this and so does RedFlagDeals.
I’ve used them. They both have some problems, though:
We know the inflexibility problem in this case is because the “data source” and the notification system (and querying of the data) are coupled to each other. We have no way of querying the data ourselves, the communities themselves provide one-stop shops.
Our solution? To decouple the stream of data from the notifications and allow us to create more customized ways of filtering on the data. IFTT is an inspiration for Huginn and kind of solves this problem. IFTT actually has a widget for Slickdeals here but it’s not very customizable. It’s likely you could create your own using IFTT with some work but you won’t be able to get #3 for free that easily and you will have trouble getting 2. IFTT is quite fast, thought.
If you don’t have your own infrastructure, it’s not a bad way to go. But let’s look at alternatives using Huginn to do this. I’m not going to cover the installation, so check out their install guide (I recommend Docker if you can use iti) or my install guide using unRAID if you want to follow along. Otherwise, time to get started.
If you don’t know what Huginn is you can read more about it here. It’s essentially a system that allows you to pull data from various sources, aggregate it into events, query it, and then act on it. That sort of vague, right? But it’s kind of exactly what we need…
This is our general plan of attack:
Our notification stack is going to be Pushbullet for a few reasons:
We’ll pull our data from boring Atom RSS feeds – reliable, updated fairly quick and standardized. Huginn also supports web scraping for more speed (since RSS feeds can sometimes lag behind) and for sites that do not provide federated access to their data. We won’t touch this here but it could be pretty easy to implement and swap into.
And in the end, our goal is to look something like this:
in response to something like this:
That’s the plan. Let’s figure out how to execute.
A motivation for Huginn
This is all just some basic HTTP requests and data extraction! I can write something working in a couple hours, I don’t need Huginn!
Some Developer before attempting this, probably. (2018)
If you had this reaction, then read the requirements below. Explicit and implicit requirements are dissected and include what complications might arise:
cron
but now you have to manage cron
jobs.I mean, it’s not that you can’t write all that but it’s a matter of if you should or need to. Setting this all up can easily take a few hours if you’ve never done it before and even an experienced developer is going to take a bit – and then you have to deploy it. So, that means some kind of deployment technology needed.
Maybe read on first and then come back and see if this is still appealing.
Huginn uses things called events to pool data into a queue that can be processed by other agents. As a first step, let’s take a look at using this RSS Feed to get out information.
We’ll create a new RSS Agent and set it up like so:
A few explanations in order:
With that saved, go ahead and run it and you should generate some activities in the events stream:
Perfect, we have something that now runs and gets the data and serves it up for us. Let’s get a filter in front of that now.
We now have our own copy of the data using RSS. It could be through scraping the HTML using the scraper agent, too, though. It does not matter – what does is that we now have events. We don’t want to send every Hot Deal to our devices since that’s just going to result in too many posts. Probably at least three dozen per day, minimum and we don’t want to have to keep checking these.
First, let’s do a little bit of research using Google to figure out what we can expect.
I’ll save you the trouble if you don’t want to search yourself through pages of data and let you know what I came up with:
Let’s get a corpus of data going and then try writing some Regular Expressions to match the examples we want. Using a basic script, we can get some of it out:
const cheerio = require('cheerio')
const fs = require('fs')
const data = fs.readFileSync('./file.xml');
const $ = cheerio.load(data, {
xmlMode: false
})
// Get all the title tags
const $titles = $('title')
console.log($titles.first().text());
$titles.each(function (index, titleNode) {
const $titleNode = $(this);
console.log($titleNode.text());
})
(There’s probably a way to do this in Shell as well – but this got me the results in just a few minutes, so good enough)
Our next step is then to write a regular expression that can match as many of the items in the corpus as we can without tripping too many false negatives. Consider the following regular expression below as an example:
price.(error|mistake)|(h.*o.*t(!))|error|hot/gmi
It covers a few of the cases here without triggering the false positives on the items above. There’s a few items in the list I might personally be interested in (the pre-order for that video card…) but let’s keep our task focused so that we don’t have scope creep.
With this basic regular expression (in practice, I use a slightly more complicated one but this is pretty good) we now have a good basis to filter against. In practice, we could tweak this is a bit more but it gets us a lot of value with low effort (this cruddy regular expression only took 4 minutes to write – yay!)
Creating a new Trigger Agent with the following configuration would give us what we want:
(Trigger Agents allow us to filter data and then generate other events based on these. You can think of the raw data as the base fact and then this is a refinement on those facts. Specifically, it generates additional events we are interested in real time information for. We could still use the raw data for other things since it does not transform it but rather makes a new event for consumption – for example, a daily digest of all the Hot Deals for the day at the end of a day.)
You can use the “Dry Run” functionality to inject some test data to verify it works – but our corpus data should at least give us some confidence. If you wanted to take this to another level, you could have a small toy app that had two test fixtures – valid and invalid and have a script that runs the regex against each test case when modifying it to make sure you don’t have any that are invalid. For now, my regular expression has been running so I have no need to modify it…
Time to get that info into the cloud.
The last part is the easiest part – we just need to get those events we generated into our Pushbullet devices. We can go ahead and create Pushbullet Agent and make sure you get an API key from them.
You can configure it similar to below:
.. and you do need to fill in your API key. After that, you should have everything you need. Since this is a simple dumb relay system, it’s also why I’m comfortable introducing a cloud dependency here. It could easily be swapped for anything else since it lives at the end of the flow.
That’s it! With that, we should now have real-time updates to our devices like we showed above. It all works but you don’t have to take my word for it – you can create a manual agent and then trigger some events to watch the entire thing work together. You could also try finding a deal and posting it as well and that would also trigger your new test.
With this, we now have completely automated the process of getting notified of price errors and other interesting deals. I’ve been running a variant of this for a while and scooped up at least a few things such as Google Home Mini’s for $30 to due a coupon error. The original thread title used to say HOT but it was modified since the deal has been expired. It’s not perfect, so I often actually combine this with digests at the end of the day to comb through “not error” things but things I still may be interested in.
Happy savings & hope you learned something!
Got a better idea for a more comprehensive regular expression? Are there better platforms? Just want to say hi? Let me know in the comments.