Text Mining – THATCamp British Library Labs http://britishlibrarylabs2015.thatcamp.org 13 February 2015, British Library Conference Centre Fri, 13 Feb 2015 07:20:50 +0000 en-US hourly 1 https://wordpress.org/?v=4.9.12 BL Labs Competition – Tips, Advice and Looking at BL Data http://britishlibrarylabs2015.thatcamp.org/2015/02/13/bl-labs-competition-tips-and-advice-and-looking-at-bl-data/ Fri, 13 Feb 2015 07:14:54 +0000 http://britishlibrarylabs2015.thatcamp.org/?p=237 Continue reading ]]>

The new British Library Labs competition for 2015 is live and closes on 29 April 2015. The competition encourages anyone to come up with an idea of what you might do with British Library digital content. We will choose 2 ideas by May 29th 2015  and you will work from June to October 2015  as ‘researcher in residence’ at the British Library (expenses paid up to £3600) and showcase your work on November 2, 2015 where you can win a first prize of £3000.

Previous winners have included:

  • how to make statistically representative samples from our book collections (Pieter Francois’s Sample Generator)
  • applying the intuitions of a DJ to working with digital collections (Dan Norton’s Mixing the Library, Information Interaction and the DJ)
  • linking digitised handwritten manuscripts to transcribed texts in visually appealing way (Desmond Schmidt and Anna Gerber’s Text to Image Linking Tool)
  • finding Victorian jokes in our digital archives, creating a database of Victorian humour and attempting to make Victorian jokes funny again over social media (Bob Nicholson’s Victorian Meme Machine)

This workshop will include an overview of the competition, give advice and tips on the application process with a question, answer and discussion ‘clinic’.

This will then be followed by a look at some of our digital data we have available to either shape your ideas or inspire you to come up with a new exciting one whether or not you want to enter our competition. What we have learned more than anything is that people’s idea’s change once they see the digital content we have.

So if this session is chosen, we will give you wireless access to our shiny new, mini Network Area Storage device with around 8TB of data on it. We will give you a walk through of what’s on there and then you will have a chance to explore and investgate it and more importantly grab what you want! Our NAS box contains:

  • 3 million catlogue records from the British and Irish national library catalogues
  • 107,000 Digitised playbills from 1602 – 1902
  • 1 million images from our Flickr release, including metdata, user generated tags for around 70,000 images, over 3000 georeferenced maps, OCR text from all the books (22 million pages)
  • Metadata from Image, Sound, Media, Electronic journals collections.
Look to see what's on our shiny new mini-nas

Look to see what’s on our shiny new mini-nas!

Don’t miss this opportunity, so make sure you vote to have this session!

 

 

 

]]>
Getting experienced http://britishlibrarylabs2015.thatcamp.org/2015/02/05/getting-experienced/ Thu, 05 Feb 2015 14:20:56 +0000 http://britishlibrarylabs2015.thatcamp.org/?p=191 Continue reading ]]>

Digitised literature gives us the opportunity to explore how people experienced the past. However, if extracting tangible things from texts such as names and places can be tricky, consider the challenge of extracting something intangible such as an experience.

If the author has been kind to us, we can set our computer to look for keywords in the text such as ‘read’ and ‘listen’. We can use those words as cues to locate the description of an experience. For example, an officer in the Western Front trenches might describe the solace he finds when he reads Jane Austen; but what if his diary simply states that he grabbed Austen from his pack for some solace? How can we program a computer to extract that as an experience from the text, regardless of how the sentiment is expressed?

In this session, we would like to explore the challenge of extracting experience, in whatever form, from digital texts. Can we afford to have our valuable humanists wading through reams of data before they can get to grips with the real purpose of their study, or can we get the computer to tackle the issue of finding candidate experiences that merit close reading?

While our own immediate interests lie in historic literature, we believe the underlying challenge is equally applicable to extracting experiences from other digital sources up to and including contemporary social media.

The primary aim of the session is to discuss the challenge, and from that seek to establish some general principles to automate the identification of ‘experience’ in digital texts. We can initiate the discussion drawing from our own work with reading experience (www.open.ac.uk/Arts/reading) and listening experience (www.open.ac.uk/Arts/LED).

If we make good progress with the discussion, and if time permits, we can extend the session to try out some of the ideas and any tools that may already exist. We will bring along some of our data and a suitably powerful laptop to start the work off, but we would love to see more and varied data, and to see any existing tools, to progress addressing this challenge.

]]>
Enriching digitised archival content http://britishlibrarylabs2015.thatcamp.org/2015/02/03/enriching-digitised-archival-content/ Tue, 03 Feb 2015 16:19:30 +0000 http://britishlibrarylabs2015.thatcamp.org/?p=174 Continue reading ]]>

Over the past year I have been project managing the content workstream of the British Library’s flagship digitisation programme with the Qatar Foundation, now launched and live on a bilingual free-to-use portal at www.qdl.qa.



This service offers a unique resource on the history of the Gulf region, drawing on primary collections at the British Library including the India Office Records, visual arts, digitised sound collections and the maps collections. You can learn more at www.qdl.qa/en/about.

My interest in this session is to riff with some digital tools proposed by the group for enriching the content without any further development to the portal. For example, that might mean:

Or many other ideas besides. I do hope you’ll join and contribute. Here’s a video to help you decide:

]]>