
This is a bit more organized than the other two files in the root directory, and it also has a larger feature set as more time has been spent on it. It can then for each word in the word set take its dot product with the queried word, and find the words that yield the greatest dot product. However, the Google News dataset is quite large (3.4 GB!), so this implementation reduces the dataset to include words reasonabally similar to emojis (which is about 310 MB). Ideally one could use the entire word2vec model to compare the lookup word with the word set and find the closest emoji. While this implementation is quite fast and memory efficient, the number of words that get mapped to emojis is quite small. When a word is queried, it looks up the relevant emoji in this map. It then goes through and reversed this map, creating a new map of words to their emojis.

How we use emoji is a big part of thatreacji in particular. Zapier has been a remote-only company since the beginning, and clear communication is the only way to make that work. This algorithm finds the 100 most similar words for each word in the word set, then creates a map from this data. 12 emoji we use every day that Slack doesnt come with for some reason. Things are currently done two ways ( bymaxdegree.py and bysimilarlist.py contain the different implementations): Common implemntationīoth methods use the emojilib library to obtain a set of keywords in addition to the emoji names (also provided by this library).įrom this set of emoji names and keywords, they create a set of words containing both emoji names and keywords that are later mapped back to the emoji names. Then, you can run either bymaxdegree.py or bysimilarlist.py for a console based emoji suggestion service (your console will need to support emojis), or you can run emojiserver/emojiserver.py to spawn a webserver.


Usageįirst, you'll need a word2vec dataset. An emoji recommending service that finds you the most similar emojis for a word with word2vec.
