Folklore on the Internet.
Asset? Liability? Opportunity? Threat?

Peter Millington

Notes for a Seminar Presented at NATCECT, University of Sheffield, December 2003

Home Page > Articles > Folklore on the Internet


The Internet is often viewed as a giant online repository for information – an alternative, of not a replacement for conventional hard-copy resources. However, it is more, in that it provides new ways of finding and presenting information. It should be a boon to research, but is it? Well, the answer is yes, but there are qualifications – pitfalls for the unskilled searcher and uncritical reader. There are questions regarding long term preservation of online sources, and then there are more subtle issues. For instance, digitising a manuscript or photographic archive can eliminate wear and tear of fragile originals, and make them more readily available. However, once digitised, there is a danger that the role of the original archives and their archivists is diminished, if not threatened. I will outline the issues involved, using case studies from folklore, as a prelude to an exchange of our personal experiences with the Web.


  • Benefits and Pitfalls of the Internet
  • Taking my keywords in turn
  • Using folklore for examples and case studies
  • Hope to stimulate discussion & exchange of experiences

My background

  • 4 years managing
    • Trying to keep up with best practice
    • Experience of other sites through the Links pages
    • Noticing trends
  • 30 years as an Information Scientist in Industry
    • MCLIP – Member of the Chartered Institute of Library & Information Professionals
    • The technology is secondary to the information

The Internet as an Asset

All that information out there on the Web

  • Primary Information
    • Personal reminiscences - Personal sites, Weblogs
    • News pages, events lists, etc.
    • Photographs, audio, video, etc.
  • Online versions of hard-copy resources
    • Full-text transcriptions of paper publications
    • Electronic journals
  • Databases
    • Bibliographic and Archive databases
    • Full-text databases
  • Specialist websites
  • Online Discussion Groups, etc.

Internet Assets - Folklore

How to Find Information

Search Effects in Google, AltaVista, etc

  • Similar queries yield radically different results
  • Effects of...
    • Plurals
    • Quoted phrases
    • Word stemming, etc.
  • Users rely on rankings
    • Say top 10 - 30 hits
    • Any lower and there’s a problem
  • Rankings reflect popularity & marketing

Search Effects - Google Hits

mummers 25,100
mummers' play 8,600    mummers' plays 4,380
"mummers' play" 2,190    "mummers' plays" 1,070
mummer's play 2,080    mummer's plays 937
"mummer's play" 583    "mummer's plays" 151

Ranking of

mummers >100
mummers' play 17    mummers' plays 16
"mummers' play" 27    "mummers' plays" 15
mummer's play 8    mummer's plays 11
"mummer's play" 8    "mummer's plays" 11

What’s in front of

  • 1. General Christmas site - run by Staffs school
  • 2. Drama history book
  • 4-5. Excellent Sussex site
  • 9. Ashdown Mummers - still hawking pagan rituals
  • 10. Martin Collins’ script collection - dodgy quality
  • 11-12. Green Man Mummers - New Zealand
  • 14. Norse Pagan re-enactment site
  • Rest mostly one-location sites

And just after…

  • 17. Modern compositions - Wedding & New Brunswick
  • 19. Article in Brazilian Portuguese

Observations on the Ranking of

  • is the best folk play site
    • Quantity and quality of information. High traffic
    • Should be near the top of the list
  • Why so low down?
    • Avoiding the non-preferred term "Mummers’ Play"
    • Dilution of links from other sites - 3 different URLs
  • Why are other sites in front?
    • Heavy use of the term "Mummers’ Play"
      - Less dilution with alternative terms
    • Older sites - More time to acquire external links

Databases - Searching for Texts

  • Literary Texts - poetry, prose, drama, etc
  • Folk Song Lyrics
  • Digitrad Lyrics Database -
    • Full-text searching
    • Variable quality and authenticity
  • Combined search: Digitrad or general web search, then Bodleian

Variability of Oral Texts

Behold a lady bright and gay
Behold now my lady
Behold the lady bright and gay
I am a lady bright and fair
I am a lady bright and gay
I’m a lady bright and gay
In comes a lady bright and gay
In comes I, a lady bright and gay
In comes I, the lady bright and fair
In comes the lady bright and gay
To see a lady bright and gay

Genealogical Investigations

  • Investigating Informants & Participants
    • e.g. Keith Chandler’s work on Cotswold morris
  • Web Resources

Example Genealogical Investigation

  • Cornish folk play script
    • Supposedly from Mylor, late 19th century
    • Orthography suggested late 18th/early 19th century
    • List of actors – e.g. Pentecost Langdon, Henry Solomon, etc
  • Investigation
    • Search for actors’ names in
    • Found a generation born c.1770 in Truro, suggesting performance late 1780s
  • Follow-up and outcome
    • Confirmation in Cornish archives - The actors were all Cordwainers
    • Rediscovered the manuscript - Consistent with the proposed new date
    • Paper published in Folklore, April 2003
  • Would not have happened without the Web

Other Finding Methods

  • Online Directories
    • Which headings should folklore appear under?
      • Arts & Humanities?
      • Religion?
      • Entertainment?
      • Regional? etc.
    • Patchy coverage - Rely on webmasters submitting their site
    • Uncritical inclusion
  • Portals
    • Effectively specialist directories
    • Comprehensive and/or critically selective

The Internet as a Liability

  • Text as Images
    • Computers cannot read words, so cannot search them
  • Quality Issues
    • Including plagiarism and faction
  • Long-term Viability of Websites
    • Including archiving
  • Creating work
    • Queries from the public

Images versus Full-text

  • Images of text
    • More feasible with diminishing storage costs
    • Relatively quick to produce - scan + catalogue
    • Limited searching - catalogue data only
  • Full-text
    • Eminently searchable
    • Slow - even with OCR - due to quality checks
    • Illustrations may be omitted
  • Both
    • Search on text - display the image
    • Can tolerate poorer quality text transcriptions

Quality Issues

  • Anyone can publish online
    • No peer review
    • Information and/or ideas may be out of date
    • Info may be inaccurate - e.g. bad transcripts
    • Plagiarism
    • Faction - Fiction presented as plausible fact
  • Is this different from paper publication?
    • Easier to do
    • Cheaper production costs
    • Exposure to a much wider audience

Addressing Quality Issues

  • Lack of peer review
    • Establish peer-reviewed E-Journals & resources
  • Out of date Information & ideas
    • Challenge the authors - sources & justification
    • Direct them to valid sources of information
  • Plagiarism
    • Challenge the author - or the webmaster’s ISP
    • Seek acknowledgement or deletion of lifted material
  • Faction
    • Ask for a clear "health warning" to be added
  • Inaccuracy
    • Feedback corrections to the webmaster/author

Website Viability Problems

  • Resource Issues
    • Reliance on a single webmaster
      - What if they change jobs, lose interest, or die?
    • Too few contributors
  • Infrequent updates
    • "Static" websites tend to be penalised
  • Disappearing websites
    • Site deleted - Expired/Cancelled subscriptions, Closure of host, etc.
    • Moved sites - Webmaster changes ISP or moves to new employer
    • Transient pages - especially news & events

Addressing Viability Problems

  • Resource Issues
    • Reliance on a single webmaster
      • Document layout standards, technical procedures
      • Train backup webmaster(s)
      • Use a content management system
    • Too few contributors
      • Maintain the network - Online & offline communication is key
  • Infrequent update
    • News sections, events lists, etc
  • Lost websites (and email addresses)
    • Create archive backups of sites, just in case
    • Search by full title for new location
    • - "Take me back" database for lost & superseded web pages
    • Legal Deposit Libraries Act 2003

Internet Opportunities

  • Publishing Online
    • Adding functionality to text and graphics
  • Collecting Online
    • The benefits of online collecting slips
  • Digitising Primary Sources
    • To reduce wear and tear, and improve access
  • Data Standards
    • To facilitate information exchange and analysis

Publishing Online

  • Offers more than text and graphics
  • Include Multi-media
  • Hyperlinks
    • Active contents pages
    • Active references - Links to full-text sources, not just references
  • Not written in stone. Corrections can be made and supplements added
  • Example online folklore conference paper - "This is Mummers’ Play I Wrote"

Collecting Online

Online Collecting Slip

    • Draft Flora Sheffielder collecting slip
  • Technical Benefits
    • Enforcing obligatory fields
    • Restricted multi-choice options
    • Cloning Personal Details for multiple submissions
  • Potential
    • Image queries - "What do you know about this?"
    • Informants’ attachments - photos, images, texts
    • Better handling of conditional questions
    • Possibility of computer assisted filing, analysis, etc

Digitising Primary Sources

  • Types of source
    • Unique archive sources
      • Manuscripts, fieldwork data, official records, etc
    • Rare and fragile sources
      • Early books, early audio & cinematic media, ephemera, etc
    • Special Collections
      • Folk drama, J.M.Carpenter, Bodleian Broadside Ballads, etc
  • Approaches
    • Full-text transcriptions
    • Images of text
    • Graphic images
    • Sound
    • Moving images
    • Etc.
  • Impact
    • Improved searching – Especially full-text searching
    • Conservation and security of originals
    • Better than microfilm
    • Wider availability - more than one location

Data Standards

  • Mark-up formats
    • SGML, HTML, TEI, EAD, XML, etc.
  • Merging data
    • Unified databases
      - e.g. Access to Archives (A2A), COPAC, etc
    • Disparate sources and different document types
      - e.g. LION – Literature Online
  • Impact
    • Easier data-sharing – Little or no reformatting
    • Even greater accessibility
    • Sharing software tools - e.g. phonetic searching
    • Data mining – going beyond information retrieval

Benefits from Coding/Mark-up

  • Computers are hard task-masters, so...
    • Better quality assurance
    • Less fudge
  • Lots of little decisions add value
    • e.g. Distinguishing between different types of personal name - Author, informant, collector, etc.
    • More informative data
    • Sometimes unexpected discoveries

The Internet as a Threat

  • Loss of hard-copy sources
    • Paper journals already being discarded
    • Why keep anything on paper?
    • Could the online data be destroyed?
  • Licencing
    • Paper is permanent. Licences are fixed term
    • Access restrictions for licenced sources - disenfranchising some users.
  • Sidelining archivists, experts, etc.
    • Less need for personal contact with enquirers
    • If it’s online, why visit the archive or library?

Summary - Positives

  • Makes information more widely available
  • Makes information more searchable
  • Less reliance on discrete physical locations
    • Less travel to distant archives and libraries
    • Can work from your own desk – even at home
    • Available "out of hours"
  • Major time savings
  • Adds new functionality to normal media & methods
  • Facilitates the exploration of new ideas
  • Makes things possible – e.g. investigating hunches

Summary - Negatives

  • Search vaguaries
    • Effective search terms
    • Ranking results
  • Variable quality of information
  • Long-term preservation of online data
  • Decline of hard-copy sources
  • Disenfranchisement of certain users
  • Sidelining professionals

© Copyright 2003-2004 Peter Millington ( Last Updated: 09-Apr-2016