Fake copies of our website made us disappear from Google search
Our web developer Gary Robertson reports on a little known black hat technique repurposed to attack the Anglia Research website. He argues that the ease with which bad players can hide their identities means that this technique represents a potential threat to all businesses, and he provides suggestions on how to protect yourself from the horror of a ‘copy and click-farm’ attack.
[Article first published in 2019]
I was on holiday in Tenerife when I received an email from my boss, subject: “Crikey!”
The email contained a link to a website that was – with the exception of the domain name, company name, logo and contact details – an exact replica of the Anglia Research website.
I ran several Google searches and quickly discovered that where our website had previously appeared at or near the top of many search results, now a duplicate, familytreelocators.co.uk, stood in its place. Our pages had been pushed right down the search results and in some cases disappeared altogether.
On the face of it, what we were experiencing was simply copyright violation, and because the theft was so obvious, we were able to get the host of the duplicate website to remove it within a day of discovering it.
Less than a fortnight later it was back online and once again knocking us out of Google searches, while another twin, familytreelocator.com, lurked in the background. It had probably been there for months – we just hadn’t noticed it.
The source code showed when and how the theft occurred. The evidence was in the first few lines: “Mirrored from www.angliaresearch.co.uk by HTTrack Website Copier” on “27 Oct 2018 08:38:59 GMT”.
We discovered the domain name familytreelocator.com was registered on 22 October last year (2018) and the following day a limited liability partnership was incorporated and officially registered at Companies House as Family Tree Locator LLP, company number OC424587.
That company registration was going to cause us a lot of problems because it was the reason why the US host reinstated the first fake website. Apparently their customer had complained about the removal, using the Companies House registration as proof that they were genuine.
Despite our protestations, we had not been able to get Companies House to remove the company from their database, although we could show that it was not operating from its listed business address, and that the purported owners do not appear to exist at all in any available records. It was at this point that we learnt that applications for registration are assumed to be made “in good faith” without verification (a month after we posted this article, Companies House have updated their rules).
Of course we contacted the hosts of both duplicate sites and ultimately issued DMCA take down notices. But it wasn’t until we stopped to think about the possible commercial incentives for the attack that we came to a sobering realisation and decided to inform the police.
When copyright theft becomes fraud
What we were experiencing was qualitatively different from the sorts of copyright violation suffered by most other website owners.
Google “my website has been copied” and you’ll find scores of sites that will tell you what action to take. They don’t typically advise that you go to the police. Website cloning is a common problem. If your pages rank highly in Google search, bad players may copy your website, or some of your content, in an attempt to increase their own ranking.
A higher ranking means that they can better promote their own products or services, or drive traffic to their site to increase ad revenue. Alternatively, they may hope to hijack the authority of your brand to instigate a phishing exercise. Had the hackers been doing the latter we could have reported them to Action Fraud.
But the duplicate websites did none of these things. There were no ads on the sites, no attempts at phishing, and no products or services on offer. The phone number went through to a generic voicemail and email enquiries elicited no answer.
Clearly the hackers didn’t want to divert our web traffic to themselves. They wanted to divert it into a cul-de-sac.
Once they copied our site, it seems likely that they hired a click farm to manipulate Google search results to replace our site with theirs. Click farms are large groups of low-paid workers hired, in this case, to click links on search engine results and surf websites in order to mislead the search engines about what’s popular and useful.
When the more aggressive duplicate site was operating at its peak, the only way that someone could easily find our website was if they used the search term “Anglia Research”. That’s because, of all the words on our website, only our company name was not reproduced on the duplicate.
The success of our own SEO had been weaponised against us, so that a search on a term such as “probate genealogist” would return a list of our competitors, along with familytreelocators.co.uk sitting in the position that Anglia Research would usually occupy for that term. This was negative SEO on steroids.
And anyone who followed the link to the Family Tree Locators website found themselves in a dead end where emails and phone calls would not be answered.
So when we considered the commercial incentive for the attack, we realised it could only have been instigated by one of our competitors. Rather than improving their own website to try to beat us fair and square in search engine results, they had employed black hat methods to kick us out of Google search.
There was a specific intent to damage our business, and by our estimate at least three laws were broken: section 2 of the Fraud Act 2006, section 3 of the Computer Misuse Act 1990, and section 1112 of the Companies Act 2006.
It’s sobering to think that a competitor – and necessarily in our sector that would mean a company based in the UK – would be prepared to go so far off piste and beyond Queensberry rules into illegal territory.
They did it because they could
Every business is at risk when stolen copies of websites can outperform originals on Google search, and when bad players can hide behind anonymous web registration and hosting.
We’re a medium sized business operating in a highly competitive industry. Managers of similarly sized businesses in different sectors might tell themselves, “Probate research is unregulated. It’s the wild west compared to our own sector.”
There’s a small degree of truth in that. Probate genealogy, whilst amazingly staid, finicky and bookish in many ways, is also terrifically competitive.
However, don’t kid yourself. Only a tiny number of bad players in our industry have deep enough pockets to afford black hat services. When the black hats who offer this copy and click-farm service have finished with us, they will move on to their next patch; it may be yours.
Be ready for them
- There are two simple measures that will make it more difficult for bad players to copy and misuse your site. The first is to use the canonical tag to indicate that your webpages are the real ones. However, this will not protect against a sophisticated attacker, who will simply replace your domain name in these tags with their own. The second measure is to use absolute urls rather than relative urls on all internal links on your website. Again the determined attacker will replace your domain name with their own.
- Keep website logs. We were overcautious when the EU General Data Protection Regulation (GDPR) came in and stopped logging website visitors. This meant that we threw away valuable information about the person who copied our website at 8:38 on 27 Oct 2018. We have revisited this and now record IP addresses to protect our legitimate business interests. You might want to revisit your own GDPR decisions and take legal advice.
- Make copy and paste theft more difficult. Your website developer should be able to write code to, for example, disable right button mouse clicks and certain keyboard combinations. However, it is not possible to stop a determined attacker from copying your entire website and putting their own version online.
- Monitor the Internet for close copies of your entire website. This can be done for free by creating Google Alerts using a few random, unique phrases from the main pages of your site. Should the phrases appear elsewhere, you will be sent emails to alert you as soon as Google indexes the duplicate pages.
- If you are concerned that particular sections of your website are being stolen, for example blog posts, you can use a free plagiarism checker website. Subscription services like CopySentry check regularly and let you specify pages or websites to ignore when doing so.
- Once you spot a duplicate website online you are in a race. The black hats will be busy raising their fake site up the search results and pushing your site down. Whether they are using a click farm, private blog network or any other method, their efforts will take some time to bear fruit. Nevertheless, you will need to move fast.
- Ask a developer to save a full working copy of the duplicate website to use as evidence later and hopefully provide some clues to the identity of your attackers.
- Gather all the information you will need for DMCA takedown notices. Along with every DMCA notice submission, your should provide solid evidence that you are the copyright holder. There is every possibility that your attackers will claim that you are not, as happened to us. One way of combating this is to send a link to an old snapshot of your website at the Wayback Machine.
- To have the best chance of success you will need to send DMCA notices to search engines as well as the duplicate website’s host. It is better to deal with the search engines first, in order to get a better idea of the information you will need for the host. Fill in the Google Console form for Google search and the Bing Webmaster Tools form for Microsoft.
- Next send a DMCA takedown notice to the host of the fake website. Normally you will find it more cost effective to use a specialist to do this, not least because it can be hard to find out who the host is and how to contact them. Search online for “DMCA takedown service”. If you do need to find the host yourself, you may be able to do so here.
- If the website host is not also the domain registrar you should try to send a separate DMCA takedown notice to the domain registrar, if you can find who they are.
- Sometimes the website host has closed down a fake site but it is still appearing prominently on search results. This can persist for some time so you should tell the search engines their results are out of date. You can do this for Google here and Bing here.
- Make a report to your local police. As I said earlier, this sort of attack is clearly criminal, not just copyright theft. What we experienced were denial of service attacks (in that they sought to deny access to our website); however, currently the Action Fraud police website reporting is very specific to distributed denial of service (DDOS) attacks and we were unable to report through this channel. Nevertheless, we were greatly cheered by how seriously our local police took the attacks and how well equipped they are to investigate them.
- Be more paranoid. If your competitor is using black hat techniques to attack your website on search engines, they may be using similar methods to attack you in other ways. For example, if you use paid to advertise such as Google Ads you need to check if click fraud is being used to increase the cost of your ads or to exhaust your budget.
2024 Anglia Research Services All Rights Reserved.
Anglia Research and Anglia Research Services are trading names of Anglia Research Services Limited, a company registered in England and Wales: no. 05405509
Marketing by Unity Online