Introduction
In Digital Humanities (DH) research, text analysis is one of the most fundamental and important methodologies. Conducting structural analysis of literary works and historical documents requires an environment where you can systematically annotate texts and analyze them quantitatively.
CATMA (Computer Assisted Text Markup and Analysis) is a web-based text annotation and analysis platform developed by forTextLab at the University of Hamburg. It allows researchers to tag texts and perform analysis through an intuitive interface, without requiring any programming knowledge.
Key Features of CATMA
1. Custom Tagsets
CATMA’s greatest strength is the ability to freely define tagsets. Rather than being constrained by existing markup schemas, you can design annotation schemes tailored to your research objectives.
For literary research, you might create categories such as “narrator’s perspective,” “metaphorical expression,” or “character emotion,” and apply them to relevant passages. Tags can be organized hierarchically, from broad categories to specific subcategories.
2. Collaborative Work
As a web-based platform, CATMA enables multiple researchers to collaborate on the same project. Each researcher can perform annotations independently, then compare and merge them to evaluate Inter-Annotator Agreement.
This is particularly valuable in humanities text analysis, where subjective interpretation plays a significant role. Comparing annotations from multiple perspectives enhances the reliability of the analysis.
3. Text Analysis Features
CATMA provides various analysis functions for annotated texts:
- Frequency Analysis: Aggregate tag occurrences to understand overall trends
- Distribution Analysis: Visualize where tags appear within the text
- KWIC (Key Word in Context): Display context surrounding specific keywords or tags
- Query Functions: Advanced searches combining multiple tags and keywords
4. GitLab-Based Project Management
CATMA internally uses GitLab for project data management. This ensures that annotation change history is automatically recorded, and you can revert to previous versions. This maintains transparency in the research process and enables reproducible scholarship.
Practical Applications
Literary Research
Consider using CATMA for narrative analysis of modern literature. Upload a novel text and create tagsets for “direct speech,” “indirect speech,” and “free indirect speech.” By tagging relevant passages, you can quantitatively assess the distribution of narrative techniques across the entire work.
Historical Document Analysis
CATMA is also useful for analyzing historical letters and official documents. Define entity tags like “person name,” “place name,” “date,” and “event” to structure the information within documents, enabling you to map relationships between people and organize events chronologically.
Corpus Linguistics
By applying consistent annotations to large text datasets, you can build a foundation for linguistic analysis. Tag parts of speech and syntactic structures to analyze patterns in language use.
Getting Started
CATMA is accessible through any web browser:
- Visit the CATMA website and create an account
- Create a new project and upload your text(s)
- Define tagsets (you can also use existing templates)
- Begin annotation work on the text
- Use the analysis features to visualize and aggregate data
Tips and Considerations
- Upload texts in plain text format (UTF-8) for the most stable experience
- For large texts, consider splitting them into appropriate units before uploading
- Tagset design significantly impacts analysis results, so plan carefully in advance
- When collaborating, create annotation guidelines beforehand to ensure consistency
Summary
CATMA is a powerful tool that enables sophisticated text annotation and analysis without requiring programming skills. With customizable tagsets, collaborative work support, and GitLab-based version control, it addresses the needs of humanities researchers. It is an ideal starting point for anyone looking to incorporate text analysis into their DH research.