Geospatial Data Management & Metadata
This Presentation:
- Data Management and Best Practices
- Overview of Geospatial Metadata
to view speaker notes for this presentation click on your >> s << key
a new window will open and you can view the presentation in speaker mode
What do you mean, "Data Management"?
you plan, find, create/edit, analyze, describe, and share/preserve data
throughout these processes you make decisions about how to manage your data
Have you ever thought:
- How should I name my files?
- Where can I store my data?
- What should I keep track of when I make changes?
- How will I explain my data to others?
Data Workflows
GIS Analyst ⮕ IT Specialist ⮕ Community Member
Grad RA ⮕ Research Cluster ⮕ Funder
Workflows can break down
inadequately described data versions
meaningless filenames
changes in storage locations
...what else?
Data management plans
detailed data management plans are often necessary for funding proposals and satisfying grant requirements
data management planning tools:
DMPtool.org
DMP Assistant
A few basic geodata management best practices
File Naming Guidelines
best practices include being consistent, and keeping file names short and descriptive.
File Naming - Dates
Use YYYYMMDD format
Do: homework_20200319.txt
Don't: homework_19032020.txt
File Naming - Identifiers
Use unique abbreviations for project names or grants
Do: fhabc_notes.txt
Don't: forest_history_association_of_BC_notes.txt
File Naming - Descriptors
Descriptor should be minimal but unique
Do: fhabc_grantProposal.pdf
Don't: fhabc.pdf
File Naming - Delimeters
Use _ or - to divide your filename elements
Do: fhabc_grantProposal_v01.pdf
Don't: fhabc, grant proposal -->[v01].pdf
In ArcGIS only use _
File Naming - Versions
Note versions sequentially or with unique date and time
Do: NRC_userGuidelines_v04.doc
Do: MSL-fraserRiverSamples-20200319-0900.csv
Don't: userGuidelines_final_edits_2_forreal.doc
File Naming - Other things
- don't start filenames with a number or underscore
- be aware of character limits
- never ever ever use spaces as delimeters
More info can be found using UBC Library's research data planning guidelines.
Attribute Naming Guidelines
best practices again include being consistent, and keeping field names short and descriptive.
it's difficult to briefly describe the output of one or many calculations!
start a codebook if you need to abbreviate
Attribute Naming - Character Length
be aware of limits – Shapefile limit is 10
Do: POPDEN_20
Don't: population_density_2020
Attribute Naming - Delimiters
use camelCase when necessary to divide field elements
Do: fieldName
Don't: thisismyfieldname
Attribute Naming - Codebooks
list your field names and labels
provide description and info about each one
describe how values are coded or recorded
keep it up-to-date
Structuring Directories
folders organize data for you AND for others
✔️logical
✔️predictable
README files
text files explaining a project or parts of a project so others know what it is
found in top-level directories of projects
can link to other docs or relevant information
Version Control
version control system softwares keep track of file changes
essentially a database of changes
Version Control
different types of systems for different industries
git is very common and widely integrated
geogig is emerging but specific to geodata
Data Preservation
data preservation ensures long-term access to and use of data – beyond limits of media
includes procedures regarding file formats, copyright and permissions, persistent storage and geographic location, and metadata.
Data Preservation - File formats
decide which file formats are the most reliable and persistent for your data
prioritize platform-independent, character-based formats
prioritize UTF-8 character encoding
Now let's talk about metadata
metadata describes your data so it can be used, shared, and understood widely
Metadata in Plain Language
Questions you need to be prepared to answer about your data:
USGS Metadata in Plain Language
Examples
metadata formatted for web discovery
xml-encoded metadata
Difficulties
frankly, metadata is pretty boring
it takes a lot of time
lots of standards, no clear best choice
bad metadata negatively affects:
- integrity
- discoverability
- preservability
- useablity
4 main metadata "types"
- descriptive
- technical
- discovery
- administrative
Descriptive Metadata
includes things like:
- abstract/methodology
- attribute descriptions
- purpose
- uncertainty errors
- access
Technical Metadata
includes things like:
- CRS / projection / datum
- attribute data types
- software used
- character encoding
Discovery Metadata
includes things like:
- title
- date
- keywords
- geographic extent
Administrative Metadata
includes things like:
- copyright
- contact info
- status
Metadata Standards
why have metadata standards?
- ease transformation/conversion
- ensure proper interpretation
Metadata Standards
2 main geospatial metadata standards
Metadata Standard - ISO
flexible and internationally recognized
generally recommended
complex
documentation costs money
don't worry! there are several tools to help you create and edit metadata!
Metadata tools and editors
ArcGIS Pro!
catMDEdit
mdEditor.org (beta)
GeoNetwork
and more!!
creating metadata can be tedious. But remember: metadata will make your data more reproducible, sharable, and impactful.
motivational quote:
metadata is a love note to the future
Thanks!
Evan Thornberry
evan.thornberry@ubc.ca