Link Search Menu Expand Document

Data Collection Basics

Table of contents
  1. Introduction
  2. Efficient Design
  3. Staying Consistent
  4. Eliminating Redundancy
  5. Accountable Recording
  6. Team-based Problem Solving

Introduction

The CHCD currently relies upon collecting bulk historical data on .xls, .csv, and .gsheet spreadsheets. Due to the nature of historical research, however, there is no “standard” spreadsheet to use for data collection. Historical sources rarely have information on every nodes type, relationship type, or property in the CHCD. Furthermore, a spreadsheet that contained rows and columns for each point of data in the database would quickly become overwhelming.

As such, the CHCD project team and project partners must design customized spreadsheets that fit the shape of their historical materials and allow for the efficient collection of data. The following principles can help guide research teams as they design their own spreadsheets.


Efficient Design

The CHCD project team will place all historical data recorded in spreadsheets through a process of data cleaning once received from project partners. This process will seek to eliminate redundancies and convert all of the historical data into triplet form (i.e. node-relationship-node), so that it can be input into the database.

This means that the organization of a spreadsheet should prioritize efficient data collection. Do what works best for your historical materials.


Staying Consistent

The freedom of spreadsheet design must be kept in balance by consistency in the process of data collection. As the completed spreadsheet will most likely be cleaned by someone who did not record the data, try to follow the following principles:

  • Readability: it should be apparent what each column, row, and cell are recording.
  • Succintness: the properties and relationships in the database are almost all made up of one to two words. Make sure you are recording your data in a similar manner. Do not use long form prose to record data (i.e. no sentences, phases, or paragraphs)
  • Transparent Headers: make sure your headers communicate what is in the column. If recording more than one property in the cell, record the format of the data in the header.
  • Consistent Denotation: when recording multiple kinds of data in a cell, make sure to use a consistent system of denotation. Commas, semicolons, parenthesis, and periods should all be operating in different ways.
  • Placeholders: When a piece of data is missing from a certain cell, use a special placeholder symbol (e.g. --, ??, ~, etc. ) to ensure that you maintain a consistent data format.

Eliminating Redundancy

Redundancy is what happens when you record the same information in multiple places. This is a problematic practice because it increases the risk of errors, and will lengthen the time it takes to clean the data. When designing your data, ask yourself the following questions:

  • Is this data recorded elsewhere in my spreadsheet?
  • Is this data inherent in another data point?
  • Is it possible for this data to be contradicted somewhere else in my spreadsheet?

Accountable Recording

The CHCD is committed to both accessibility and academic excellence. As such, it is essential that data is adequately paired with the sources it comes from. How this pairing is reflected in your spreadsheets, however, will largely depend on the nature of your historical sources and data collection goals.

If you are working from a single source, for example, it may be enough to only record page numbers. If you are working with multiple sources, it may be good to think about using an abbreviated citation format.

Whatever route you choose, make sure that the origin of the data is clear so that the project team can adequately reflect the source of the data in the CHCD.


Team-based Problem Solving

Historical data sources are complicated and issues are bound to arise during the process of data collection. It is common to encounter historical data that cause you to ask questions like, “How would I record this?” or “Should I record this?” These sort of encounters often challenge the data structure you began your spreadsheet with, and require you to adjust.

If any of these questions arise, feel free to contact the CHCD project team to troubleshoot your historical sources and spreadsheet design.

We are in this together!


Back to top

Copyright © 2021 China Historical Christian Database.