Language: EN

csharp-catalyst

Natural Language Processing in C# with Catalyst

Catalyst is an open-source library for C# designed to facilitate natural language processing (NLP) in .NET applications.

It provides advanced tools for text analysis, tokenization, and language modeling, allowing developers to integrate NLP capabilities into their applications efficiently and easily.

The great strength of Catalyst lies in its focus on speed. It is designed to offer fast and efficient processing of texts, making it ideal for applications that require agile NLP.

Features of Catalyst,

  • Tokenization and morphological analysis: Breaks down the text into words and analyzes its structure.
  • Part-of-speech (POS) tagging: Identifies and labels the parts of speech in the text.
  • Named entity recognition (NER): Detects and classifies named entities such as people, places, and organizations.
  • Language model: Facilitates the creation and use of language models for various NLP tasks.
  • Support for multiple languages: Compatible with a wide range of languages.
  • Extensible and modular: Allows integration with other components and extension of functionalities.

Catalyst is Open Source and all its code and documentation are available in the project repository on GitHub - curiosity-ai/catalyst.

Installing Catalyst

To start using Catalyst in your .NET project, you first need to install the library via NuGet. You can do this through the NuGet Package Manager in Visual Studio or by using the NuGet console.

Install-Package Catalyst

How to Use Catalyst

Text Tokenization

This example shows how to use Catalyst to tokenize a text into individual words.

Catalyst.Models.Spanish.Register(); //You need to pre-register each language (and install the respective NuGet Packages)

var sentence = "The quick brown fox jumps over the lazy dog";
Storage.Current = new DiskStorage("catalyst-models");
var nlp = await Pipeline.ForAsync(Language.Spanish);
var doc = new Document(sentence, Language.Spanish);
nlp.ProcessSingle(doc);
Console.WriteLine(doc.ToJson());