Natural language processing

What is natural language processing?

Natural language processing, or NLP, is a field of artificial intelligence that focuses on the interaction between computers and humans using natural language. NLP is a branch of AI but is really a mixture of disciplines such as linguistics, computer science, and engineering.

There are a number of approaches to NLP, ranging from rule-based modelling of human language to statistical methods. Common uses of NLP include speech recognition systems, the voice assistants available on smartphones, and chatbots.

Is NLP the same as AI or machine learning?

Natural language processing is a subfield of artificial intelligence that focuses on the interaction between computers and human language. AI is a broad field that encompasses many different areas, including robotics, computer vision, machine learning, and more. NLP specifically deals with how computers can understand, interpret, and generate human language.

It is possible to develop natural language processing systems with no machine learning. For example, a simple chatbot such as Joseph Weizenbaum’s ELIZA, which applies manually written rules to simulate a psychiatrist’s conversation, is an NLP system that contains no machine learning at all. Likewise, a supermarket chain’s machine learning system which learns from customers’ purchases and recommends future products contains no NLP at all. But all of these belong under the umbrella of artificial intelligence.

(Statistical NLP)NLPELIZAMLGoogle TranslateArtificial intelligenceFace recognition

Examples of natural language processing

We encounter a number of common applications of NLP every day. Here are a few examples:

1 / 3

Email filters

Spam filters were some of the earliest applications of NLP. Nowadays email providers such as Gmail offer a sophisticated classification of emails into primary, social, or promotions.
2 / 3

Machine translation

The state of the art in machine translation is currently neural machine translation. Tech giants such as Google and Microsoft can leverage their large corpora of texts in multiple languages to train neural networks to produce high-accuracy translations. Neural machine translation uses vector representations known as 'embeddings' to model the meaning of a word or sentence.
3 / 3

Predictive text

Texting in the pre-smartphone days was so laborious that a whole dialect of English, "textspeak", was needed! Nowadays textspeak is a thing of the past, as autocorrect and autocomplete allow us to write a small fraction of a sentence and a smart language model can suggest the most probable words to complete a message. Predictive text can also learn from what you write - have you ever taken a friend's phone and written a message? Over time, your phone gets used to the idiosyncrasies of your writing, and its predictions are a weighted mixture of a general language model and personalised statistics.

How does natural language processing work?

There are a number of approaches to processing natural language, as no two NLP tasks are the same. However, we can divide NLP into two broad approaches: symbolic NLP and statistical NLP, although a hybrid is becoming more popular.

Symbolic NLP was the dominant approach from the 1950s to the 1990s, and involved programmers coding grammar rules and ontologies into a system, cataloguing real-world and linguistic knowledge. Statistical NLP is currently the dominant approach, where machine learning algorithms such as neural networks are trained on vast corpora of data, and learn common patterns without being taught the grammar of a language.

Phases of natural language processing

Traditional NLP pipeline

A traditional NLP pipeline follows a series of steps to turn a sentence into something that a computer can handle. This is the approach taken by a number of widely used NLP libraries, such as spaCy and Natural Language Toolkit (NLTK), although not all steps are always present.

  1. Tokenisation: Breaking down an input text into smaller chunks like words or sentences.

  2. Stop-word removal: Eliminating commonly used words like “a”, “an”, “the”, and so on as they do not provide context.

  3. Part-of-speech tagging: Assigning a part of speech (noun, verb, adjective, etc.) to each word in a text.

  4. Named Entity Recognition: Identifying and classifying entities like people, places, and organizations in a text.

  5. Sentiment Analysis: Identifying the overall emotion or sentiment behind a piece of text, such as positive, negative, or neutral.

  6. Machine Learning: Using algorithms to analyse patterns in the text and learn from it.

However, with the advent of neural networks and deep learning techniques in NLP, these pipelines are becoming less relevant.

Convolutional neural network for NLP

Convolutional neural networks (CNNs) were developed for computer vision problems, such as recognising handwritten digits on envelopes. However, in the 2010s they found widespread use in text processing. This is thanks to the invention of the Word2vec algorithm, which allows words to be represented as vectors in a high-dimensional space, allowing a document to be converted into a matrix which can be handled as if it were an image.

The pipeline for a CNN is as follows:

  1. Tokenisation: The input text is broken down, as in traditional NLP.

  2. Word vectorisation: words are converted to vectors according to a lookup table, and the entire document becomes an n×m matrix.

  3. The matrix is passed into a convolutional neural network, which can perform tasks such as document classification.

Transformer neural networks for NLP

The state of the art is currently the transformer model. A transformer is a neural network which processes a sequence of tokens and calculates a vector representing each token which depends on the other tokens in the sequence. This is unlike the word vector method described in the previous section, as a word will have a different vector representation depending on its role in the sentence.

A transformer model such as BERT can transform a sentence into a single vector in high-dimensional space. Sentences with semantically similar content appear close together in the vector space.

Business applications of natural language processing

NLP is used in a variety of business areas and industries.

Business functionApplication of NLP
Customer serviceChatbots on company websites. These reduce call centre costs and allow companies to run analytics on chat logs.
Customer serviceTriaging incoming emails using a classifier
OperationsEstimate the risk of a clinical trial protocol failing, or the cost of a repair.
OperationsMachine translation: Google, Azure translate
Market researchMachine learning models such as Whisper can transcribe interviews with Key Opinion Leaders (KOLs) in pharma
OperationsDocument summarisation models
OperationsIdentify key products or locations mentioned in a text, and extract their relationships. For example, a doctor says “I would prescribe (DRUG) with (DRUG)”, and a smart model may be able to identify the relationships between the entities.

How can I learn about NLP?

If you have a background of studies in a different area, and would like to get into natural language processing, there are a number of books and other resources available to help you make the move.

Books about natural language processing

  1. Speech and Language Processing by Jurafsky and Martin
  2. Foundations of Statistical Natural Language Processing by Manning and Schütze
  3. Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit by Bird et al
  4. Deep Learning by Goodfellow, Bengio et al

Video series and online courses

  1. Natural Language Processing by Jurafsky and Manning
  2. Coursera intro to Natural Language Processing
  3. CS224d: Deep Learning for Natural Language Processing (Stanford)


  1. Language Log by Mark Liberman

How can I get a career in NLP?

First of all, having an interest in languages, and developing a career in NLP, are different things. If you want to get into NLP, you will need an interest in algorithms, problem solving, and linguistics.

  1. Learn the basics of NLP: Start by acquiring a basic understanding of NLP by working through some of the resources above, such as the Stanford NLP course.

  2. Develop strong computer science, coding, and software engineering skills: As NLP requires a lot of programming skills, proficiency in programming languages such as Python is crucial. You should also gain an understanding of the fundamentals of machine learning and deep learning.

  3. Gain practical experience: Work on NLP projects to gain practical experience. Participate in online forums and contribute to open source projects.

  4. Pursue advanced studies: You may consider pursuing a Master’s or Ph.D. in NLP or a related field to dive deeper into this area, if your finances and commitments permit.

  5. Network: Attend conferences, meetups, and join online groups to network with other NLP professionals and keep up-to-date with the latest trends in the field.

  6. Look for job opportunities: Look for NLP job openings on LinkedIn, company websites, and job boards. You can take on work as a freelancer on a platform such as Upwork to hone your skills. You could also join a company in an industry such as legal or pharmaceuticals, where there are large amounts of text data. Quite often it is the case that companies do not have anybody in house who is prepared to deal with the headache of text data, so if that’s your cup of tea, you could find a very comfortable niche.

NLP tools

Some of the most popular NLP tools include:

  1. NLTK (Natural Language Toolkit)
  2. spaCy
  3. Stanford CoreNLP
  4. TensorFlow/Keras
  5. Gensim
  6. AllenNLP
  7. Apache OpenNLP
  8. HuggingFace Transformers
  9. IBM Watson Natural Language Understanding
  10. Microsoft Azure AI.

In addition, there are cloud-based LLMs such as OpenAI’s GPT-3 and Meta’s LLaMA, which are disrupting the field.

NLP companies

Companies doing NLP include the big tech companies, such as:

as well as companies such as

There are also consultancies such as Fast Data Science.

How can I hire an NLP consultant?

There are a number of marketplaces to recruit freelance NLP specialists, such as Upwork or Fiverr. You can also contact me to arrange a consultation with my company Fast Data Science. My team and I have undertaken consulting projects for large private sector companies, startups, universities and non-profits. We are likely to be able to deliver value quickly and within your budget, and the initial conversation is free of charge.

History of natural language processing

The history of NLP can be traced back to the early 1940s, shortly after World War II, when scientists in the USA and Soviet Union were attempting to make machines which could translate between languages, such as English and Russian.

In 1950, the British computer scientist Alan Turing proposed a test to determine a machine’s ability to exhibit intelligent behaviour equivalent to, or indistinguishable from, that of a human. He called his test the “Imitation Game”.

I propose to consider the question, ‘Can machines think?’ This should begin with definitions of the meaning of the terms ‘machine’ and ‘think’.

– Alan Turing, in Mind (1950).

In 1957, Noam Chomsky developed his Universal Grammar, the theory that language is innate and hard-coded in the human brain, and there are common rules which underlie all human languages which cannot be explained simply by observing how children learn to speak.

From the 1950s to 1970s, researchers in NLP began to divide into two camps: those favouring a symbolic approach to modelling language, and those who preferred a stochastic approach. Symbolic NLP involves developing formal language rules, similar to what would be found in a Latin primer (“a sentence consists of a noun phrase followed by a verb phrase”). The polar opposite of this is stochastic NLP, where a program calculates statistics and probabilities, such as the frequency of words and pairs of words in a corpus.

In the 1970s, researchers developed formal logic-based languages such as Prolog, which could model legal questions or logical problems. Rule-based systems were developed for discourse modelling. Over the next few decades there was a gradual transition towards machine learning algorithms for NLP, due to the availability of computational power and a reduction in the importance of “purist” linguistics such as Chomsky’s theories.

By the 2000s, large amounts of text data were widely available and companies such as Google were able to build large-scale statistical translation models. In the 2010s, a further shift took place towards neural networks.

In 2013, a team at Google introduced the Word2vec algorithm, which represents words in a lexicon as points in vector space, where the distance between points is significant and corresponds to semantic similarity. Then in 2017, Vaswani et al introduced the transformer architecture in a paper called “Attention is All You Need”, which was yet another quantum leap in the field. Transformers have fuelled the recent explosion in large language models (LLMs) such as ChatGPT.

Today, NLP has begun to be widely used in consumer electronics as well as in business. Insurance, pharma or legal firms which need to process large numbers of documents may well resort to NLP to extract structured information, cluster items, analyse customer support logs, or predict future events.