Text Mining
Are you looking for patterns in large sets of text or researching ways to make sense of textual data using sentiment analysis, topic modeling, or more? Whether you’re new to text mining or stuck with text mining questions, we’re here to help!
What support is available for Text Mining?
We can help you with:
- Starting text mining projects
- Web Scraping, Information Retrieval, Text Collection Methods (API)
- Machine Learning for classification & Clustering
- Natural Language Processing
- Python, R, SQL
If you need help with any of the topics mentioned above, please reach out to us at: uwlib-openscholarship@uw.edu.
Resources
Tools
- Web Scraping:
- Programming based - Beautiful Soup, Scrapy, Selenium
- Commercial Software (Free/Paid) - Parse Hub, Dexi.io, Scraping-bot.io
- Text Cleaning
- TextClean - Collection of open-source tools for cleaning & normalizing text documents in R
- OpenRefine - Open-source data cleansing tool by Google
- Trifacta Wrangler - Free tool dor data preparation
- Text Analytics & Visualization:
- Gale Digital Scholar Lab - Apply natural language processing tools to raw text data (OCR) from Gale Primary Sources in a single research platform.
- ProQuest TDM - text mine large sets of news, scholarly, and other publications UW Libraries licenses with ProQuest
- Rosette Text Analytics - Suite of interoperable components for text analytics
- WordStat - Advanced Content Analysis
- Apache OpenNLP - Document Categorizer and more
- Natural Language Toolkit - Industrial strength NLP libraries in Python
Workshops
Software Carpentry Workshops
Watch for quarterly workshops to build skills in R or python through eScience Institute
Available to: Current students, faculty, and staff
Offered: Quarterly
More information: https://uwescience.github.io/carpentries/