NLP Course documentation

πŸ€— Datasets, check!

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

πŸ€— Datasets, check!

Ask a Question

Well, that was quite a tour through the πŸ€— Datasets library β€” congratulations on making it this far! With the knowledge that you’ve gained from this chapter, you should be able to:

  • Load datasets from anywhere, be it the Hugging Face Hub, your laptop, or a remote server at your company.
  • Wrangle your data using a mix of the Dataset.map() and Dataset.filter() functions.
  • Quickly switch between data formats like Pandas and NumPy using Dataset.set_format().
  • Create your very own dataset and push it to the Hugging Face Hub.
  • Embed your documents using a Transformer model and build a semantic search engine using FAISS.

In Chapter 7, we’ll put all of this to good use as we take a deep dive into the core NLP tasks that Transformer models are great for. Before jumping ahead, though, put your knowledge of πŸ€— Datasets to the test with a quick quiz!

< > Update on GitHub