In a few recent podcasts I have listened to the word2vec algorithm has been mentioned (DataSkeptic and Linear Digressions). It has the capabilities to allow machines to recognise patterns in text to an extent that may not even be possible for humans.
This is explained much better than my semi-illiterate ramblings by the DeepLearningForJ java library Dave Dupplaw (the most scientific developer I have worked with) had used for a ‘spike’ to look at some virtual agent chat logs.
“Given enough data, usage and contexts, Word2vec can make highly accurate guesses about a word’s meaning based on past appearances. Those guesses can be used to establish a word’s association with other words (e.g. “man” is to “boy” what “woman” is to “girl”), or cluster documents and classify them by topic. Those clusters can form the basis of search, sentiment analysis and recommendations in such diverse fields as scientific research, legal discovery, e-commerce and customer relationship management.”
Machine learning algorithms require ‘training’ which in this case can be explained as ‘the text the machine learns from, in order to make these predictions’. I thought it may be interesting to get every tweet I could scrape from twitter which referenced football teams local to me. I then asked the machine for words it thought were synonymous with the name of the club.
The results weren’t as jovial or offensive as I hoped they may be, but I thought it worth sharing anyway as it may prompt discussions that could be interesting. The results below indicate the team along with a quick summary of the words I thought were of most interest. I also included the other suggestions below for completeness.
Bristol Rovers - @official_brfc The references to “belief”, “labour” and “care” maybe highlight the fans thoughts on the current squad the manager has assembled
Other words the algorithm suggested were: @georgesberthonn, #utgbelieve, pro, hype, £+, @vfootball, @jamiedouble, @amykatelfc, @southglostaxis, @nbfootball, loanees, @voiceofanfield, @cooper, @geoffman, police, @brstogetchamp, @marvinjrees, @stephenlansdow, @bbcrb, @neilmaggs, cares, @uklabour, aint, labour, just…, dates, tick…@bcfctweets, @official_brfc, clearly, sometimes, ayling, luke, @bbcrb, ha, april, @angusscott, @manutd, @wada_ama, @wolves, @sporting_cp, @btsportfootball, @ecaeurope
Bristol C*ty - @bcfctweets It seems to have worked more effectively with the results for the South Bristol club. The words returned are more obvious replacements for the city twitter handle in sentences. This is possibly due to having more tweets to use as training data. The possible replacement word “Bristol” is potentially present due to their marketing. They have attempted to make this word more prominent in their “Bristol Sport” franchise(?) marketing
coyr, @instagatebs, @leejohnsoncoach, @bcfc_supporters, extension, @andystockhausen, @tommyde, fail, £m, mark, litts, owner, saying, april, @official_stfc, guests, frank, @bristolpcity, disappointment, wednesday, drive, johnson, mcallister, #generalelection, @bathcast, #bcfctweets, extension, mcallister, johnson, tuesday, bristol
Bath City - @bathcity_fc The non league clubs seem to have worse results in terms of relevance due to having less data. However I wonder if aside from nicknames etc being picked up as expected the reference to “maybe” is present due to their narrowly missing the upper-reaches of the division.
paint, twerton, manager, @jon_boa, money, @wingcommander, #readingfcrt, breakfast, @official_stfcrt, money, season, @benatkinsonuk, manager, x@official_stfc, manager, paint, #romans, painting, @jon_boa, twerton, nailsworth, paint, @jon_boa, manager, @samelliott_nlp, #romans, @pafccommunitytr, @cassidymarcus, @teambath, @carolebanwell, @bbcrtoday, maybe, @bcfc_academy, @bathcitychat, range
Forest Green - @fgrfc_official Interestingly, the first ten suggested replacement words for “Forest Green” include “vegan”, “#vegan”, “#nationalvegetarianweek”, “#govegan”. “meatfreematchdays”, “seashepard” and “vince” also get highlighted. You can therefore tell the morals of their ecotrictity famed owner are heavily reflected in the comments following their promotion to the football league.
club, vegan, league, #nationalvegetarianweek, #vegan, @seashepherd, #govegan, promotion, @ffa, @mmykie, #meatfreematchdays, #govegan, @aleague, club, league, #meatfreematchdays, the, in, @mmykie, vegan, #vegan, achieving, #meatfreematchdays, @ffa, @mmykie, englands, football, @kcmanc, vince
Gloucester City - @gcafcofficial I couldn’t mention Forest Green without mentioning Gloucester City. Unfortunately I also couldn’t see anything of intrest in the algorithms recommendations in this case. This is only to be expected with a lot less data than play-off winners and league clubs
Straw, #formershots, manny, carole, performances, andy, kinda, wanted, @inbath, hayward, bath, wingrt, @piratefm, missing, loan, twerton, @inbath, positive, carole, @peternobes, based, anything, pop, @cassidymarcus, works, bath, spend, interesting, ref, wages, @geoffdunford, baths, candidates, #twertonparkclash, hustings, @shaunywilliams, @stfcstore