荔枝视频

Jan. 13, 2022

Haskayne scholars use artificial intelligence to help detect fraudulent websites

Research will help protect consumers from 'whack-a-mole' of illegitimate sites on the internet
iStock

Every year, billions of dollars are lost to online fraud when people enter their credit card or other financial information on a website thinking it鈥檚 a credible entity, when it鈥檚 not.

鈥淥nline fraud and illegitimate websites are a major financial drain on the system,鈥 says Dr. Raymond Patterson, PhD, professor of business technology management (BTM) at the Haskayne School of Business. 鈥淔rom banking to credit cards to regulators, a number of players would love to have a better handle on finding these illegitimate, fraudulent websites. There are a lot of different interests that would desperately benefit from knowing who is a good player and who is a bad player.鈥

But identifying the bad guys is easier said than done. Unlike a bricks and mortar enterprise, fraudulent websites can just vanish. 鈥淚t's like playing whack-a-mole with an illegitimate or a fraudulent website,鈥 says Patterson. 鈥淥nce they're discovered, they'll just go out and get a new URL. It's very hard to just have a list of bad guys. They just move.鈥

Patterson, Haskayne PhD student Afrouz Hojati and Dr. Ram Gopal, of the University of Warwick, U.K.,  developed artificial intelligence techniques to help detect illegitimate websites. They built algorithms that could, with further research, provide a first line of defence in identifying potentially fraudulent websites. The research, "," was published in Decision Support Systems.

Research the first to make inroads into generalized fraud detector

It provides the first steps toward developing consumer protection tools such as an early warning alarm that goes off when visiting a potentially fraudulent website. 鈥淭his is really one of the first major attempts to make inroads on a generalized fraud detector. A lot of the previous research has been in specific domains,鈥 says Patterson.

The research builds on an algorithm the researchers developed to identify 鈥渇ake news鈥 websites. 鈥淚t was a very successful algorithm to detect whether or not a news website was fraudulent or legitimate. It had very high accuracy. But when you take those algorithms that are crafted for a particular context, like news, they don't do as well when you just throw any website at them.鈥

In this research, the scholars used websites鈥 third party request structure and information about the legitimacy of third parties to create real-time machine learning algorithms. 鈥淲henever you go to a website, there are sometimes hundreds of third parties,鈥 says Patterson. 鈥淵ou can have a third party that calls another third party, and then they call a bunch of third parties. And throughout the layers, some of those are actually very nefarious third parties.鈥

Third parties can crash your online party

Patterson compares the third parties to teenagers throwing a party. 鈥淵ou call three or four or 10 of your friends for a party on Friday night and 200 people show up. It鈥檚 exactly the same thing that happens with third parties. You might have called a few of your friends, but they called a whole bunch of their friends, and they called a bunch of their friends too.鈥

The researchers created software that lets them see the call structure 鈥 the "who鈥檚 calling who鈥 鈥 of those third parties. 鈥淭hen we can reconstruct the order of call. So in a sense, I know which teenagers invited which kids to mom and dad's house.鈥 The software runs the list of third parties through a standard database to identify whether they are legitimate, or not.

The researchers鈥 generalized algorithm works across a broad spectrum of websites in different industries. It鈥檚 less costly, less computationally complex, and less time consuming than existing detection algorithms. And because this third-party sharing information can be observed, it鈥檚 hard for the nefarious actors to manipulate or circumvent the algorithm.

鈥淭his is a really tough problem,鈥 says Patterson. 鈥淚've been thinking about this problem for a long time and I've always referred to it as the holy grail. We finally made headway when we started addressing not the algorithms, but the data structures. How could we represent the data in a different way that would shed a little more light and make it easier for the algorithms to detect. It鈥檚 a one-size-fits-all approach.鈥