Diving into Domains, Documents and Digital Ecosystems

CityLIS Term 1 Week 5. In which we dipped into domain analysis before going fully immersive; we practiced techniques for collecting and archiving tweets as a prelude for visualising and analysing them; intrepid citylisters took field trips to Highgate Cemetery (check out the DITA blogosphere for some interesting blogposts on this) and screened The Internet’s Own Boy; I investigated how big worlds can actually be quite small; we learnt about storing digital assets in repositories and what happens when you set them free; and we explored what makes good communication, (written and oral), and how to deal with the parts we find uncomfortable.

Catching Waves

In DITA this week we explored issues around researching social media.  Ernesto compared this to pinning butterflies.  I find that metaphor makes me think more of capturing a specimen from the vortex of ideas this course unleashes and pinning it into my dissertation so I’m using the metaphor of catching waves instead.  Forever rolling against the sands of time (and entropy) collecting and analysing social media feels like trying to map patterns in the shifting tides and waves that lap against our shores.  So much of what we see is on the surface and ephemeral.  This week’s session helped us venture into the deep.  My submersible for this expedition was a Twitter API application I called DITA Venturi.  I initially thought of this merely for it’s connotation with venturing but then I discovered the Venturi effect and realised I’d managed to quite aptly traverse from thermodynamics last week to fluid dynamics this week.  Apparently the Venturi effect can convert pressure into suction and Venturi also invented a device for measuring flow through a pipe.  Quite an apt analogy for sticking an application into the Twitter stream and trying to analyse it’s flow and extract it for posterity.

We learnt about two possible data transports for APIs: XML and JSON and noted that XML’s qualities make it more suited for documents whilst JSON’s simpler model of key value pairs and arrays make it good for small chunks of data.  It is JSON that is presented by Twitter API endpoints and we then used Martin Hawksey’s TAGS google scripting to extract the results of a Twitter search into a Google spreadsheet using our Twitter applications.  This provides a one off or ongoing capture of tweets and all the power of spreadsheet analytics for interrogating that twitter archive including provided summaries and graphs.  Hawksey has also built some great visualisation tools that can be used to visualise the twitter archive in different ways such as TAGS Explorer (you can try this with the demo spreadsheet that is provided by default).  This week’s DITA blog on putting this together isn’t due until after reading week so I’m going to wait until I’ve attended the British Library Labs Symposium on Monday and use #bl_labs as my case study.

This was all pretty cool and also beautiful.  Data visualisation is spectacular and artistic.  What I haven’t been able to make the leap to yet is what insight it gives.  I can understand archiving tweets.  The Twitter API only contains tweets from the previous 7 days and then it becomes much harder to access from within Twitter’s vast and commercially valuable data vaults.  Capturing tweets provides a handy corpus that researchers can go back and consult but I cannot yet understand what TAG Explorer is telling me.  What does data visualisation add and how to we approach using this corpus for meaningful research rather than just because it’s interesting?  We will pick up where we left off after reading week so I look forward to finding out.

The Science of Small Worlds

It’s quite good that I’m behind on the University of Southampton’s Web Science Mooc (#FLwebsci) as this week’s topic of using network theory to analyse social networks really complemented DITA thinking.  In this week we looked at network properties and scale free, small world networks … like the web.  These are networks where most nodes have very few connections but a few notes, known as hubs, have huge numbers of connections.  This network pattern makes even global networks ‘small’ because most nodes can be connected by a paths containing a small number of ‘hops’ between nodes.  This is typically 6, leading to the phrase “six degrees of separation”.  This video from PBS Nova explains how social networks look and how this pattern is replicated across many natural and human networks.

I watched this RSA Animate short on the Power of Networks provides a great visual accompaniment to an article by our tutor Lyn Robinson along with Mike Maguire on using the Tree and Rhizome and metaphors for patterns of information organisation.  The tree view of knowledge classification comes from the Aristotle tradition of branching hierarchies: the rhizome was a term developed by philosophers Deleuze and Guattari to describe and organisation model based on a continual shifting set of connections between things.  The tree is like a narrative, the rhizome is a map for a constantly shifting world.

Seeding Knowledge by Ceding Control

We had a preview of some aspects of the British Library’s experimental work that may feature at the British Library Labs Symposium on Monday in Information Management and Policy this week when James Baker from the British Library came to talk to us about his job as a Curator in Digital Research at the library.  Digital Research is exploring digital collections beyond resource discovery to research at scale and lowering the barriers to digital researchers.  the library’s legal deposit has been extended to UK published websites so the library can now archive born digital resources.

Some Examples:

(1) Personal Lives: From Letters and Diaries to Computer Forensics

The implications for archiving with the transition from letters and personal correspondence to Digital Lives. The British Library is interested not just in content as received on computers but performing forensic analysis on hard disks to understand “the life of how someone interacts with the machine”.  This raises data protection issues so hard to make this collection public.

(2) Infectious Texts

Combining text mining and close reading to map networks of re-printing in 19th-century newspapers and magazines (a kind of historical version of what we are doing in DITA with Twitter data).

(3) The Mechanical Curator

This project over one million images from within 65,000 books digitised as part of the Microsoft Books project. Initially they were posted on Tumblr, then Twitter then the whole collection was loaded onto Flickr (with metadata also available on GitHub) under a CC Zero (public domain)  licence.

| “We enjoyed losing control of the collection”

James listed some of the remixing and interactions: teaching (learning about curation), hacking, experiments, #immersive adaptions, incorporation into Wikimedia that the experiment has spawned.  Using web infrastructure and UX “off the shelf” they were able to experiment with doing  things it would be impossible or prohibitively expensive to do with BL systems.

Some Questions/Issues to Negotiate:

  • Derived Data: what to do about data built on data, additional metadata and potentially incorrect data
  • Remixed Collections: what happens when images are decontextualised
  • Reintegration: incorporating user generated data back into BL collections

Collections 2.0

This made be think more about how the nature of collections and research may both change if digital collections become more open and extensive, connecting with some of our DITA themes.

We are Digital Makers: in a more participatory web architecture and culture we all have the opportunity to curate and create our own ideas and projects from raw digital material provided by libraries into the public domain.

Hacking research: uses of collection data outside ‘serious scholarship’:

  • community cataloguing and classification
  • art
  • machine learning
  • education
  • entertainment

What is the “role of the curator?”

James is a curator and part of the experiments also involve thinking about how curation might evolve as a result.

| “How do we manage this dispersal?”

It sounded to me like seeding an ecosystem (by ceding control), a different and diverse role for a curator from the more traditional managing a collection. It made me think of Hans Rosling’s describing public data in his Ted Talk The Best Stats you’ve Ever Seen.

But this is what we would like to see, isn’t it? The publicly-funded data is down here. And we would like flowers to grow out on the Net.

James spoke of a spectrum of information control from authority and finality (an institutional mindset?) to adaptability and evolution (a hacker mindset?).

This raises further questions like:

  • understanding and tackling the issues that arise when informations bridges different spheres
  • what is the role of the library along this spectrum?

Thanks to James for coming along and sharing his insight and some of the British Library’s Digital Research ideas and experiments with them.  you can take a look and James’ presentation on Slideshare.

On Communication

In RECS this week we discussed communication both oral and written.  This was an interactive, and humorous, session brimming with anecdotes and views on what makes good and bad writing and presenting.  When I thought about this as preparation for this session I thought about people like Hans Rosling, Daniel Kahneman, Tony Judt, Roger Deakin, Geert Mak, Hilary Mantel and David Attenborough.   I think of being absorbed by their calm authority and their skill in distilling complex subjects into clear, simple prose. They have the quiet confidence that those who don’t see will see.  They dive beneath the froth and foaming waves at  the surface and guide you into quieter, deeper territory towards something more profound.  Like skilful divers they have mastered neutral buoyancy and have the balance, control, technical proficiency, knowledge and experience to achieve this equilibrium.  More than individuals and their ability  I thought of how good communication makes me feel.  It is about transmitting the joy and awe of rising above and standing at the summit of a mountain seeing a vista clearly laid out before you as you have never seen it before.

Yet most of us find these skills difficult and uncomfortable.  So this session was designed to help us explore and confront the good, the bad and the ugly.  Afterwards I compared the discussion we had on the art of speaking and writing with ease with my constant attempts to improve as a runner and wrote myself some motivational guidelines that might help with both!

Full Immersion

In LISF this week Lyn Robinson took us right to the cutting edge and spoke to us about her recent conference paper at Internet Librarian 2014 on immersive documents (see also her blog post) potentially a future development in the history of documents as we shift to an increasingly digital and multimedia world.  Both immersion and submersion derive from the same Latin verb meaning to dip, soak or plunge.  Immersive unreality refers to virtual worlds that are so real they are perceived as real.  Lyn located this type of document emerging from the nexus of pervasive networked computers, multisensory multimedia and participatory interaction.  At the moment this is most often tied to gaming of fan fiction but if this kind of transmedia document becomes more prevalent what are the implications for libraries and information centres.  If the British Library is navigating the shift from letters to personal computers and book deposit to born digital and researchers are struggling to capture and interrogate social networks what on earth would a library or archive of immersive documents look like?

These are early days.  There are no immersive documents yet but there are some great examples from fiction of what they might be and some interesting prototypes emerging e.g. The Craftsman.  Immersive documents need new forms of creative writing and new forms of design for transmedia and for hardware, narrative form and content producers to converge (currently developing at different speeds). They also need to go through the technology adoption curve and make the leap from early adoption to mainstream use.  Part of me remains suspicious that if you asked the majority to choose between passive and participatory they would choose passive.

This session did make me reminisce wonderfully about the Fighting Fantasy series of novels.  Who didn’t read these without bookmarking the previous branch with your finger in case you’ve made a wrong turn? These were individually participatory and gave the reader some agency in determining the outcome through the branches.  I guess we are back to the tree and the rhizome again: digital immersive documents probably offer much more in making this less a branching narrative and more an evolving narrative and also more real than leaving your fingers in three different places to check that your decision hasn’t made you dead yet so you can go back and explore an alternative story if you’ve been stupid.

There are going to be ethical and cultural issues if this form takes off:

  • what are the privacy implications?  Bad enough surveillance of activity and communication but now add performance, fantasy and dreams
  • are stories define by the medium or do stories drive the medium?
  • could you experience someone else’s experience or would context awlays get in the way

Some of the issues for LIS may include:

  • are immersive experiences documents?
  • indexing and versioning
  • retrieval systems
  • dissemination
  • preservation
  • information interaction behaviour
  • immersive literacy

Rest assured though.  If it does come and you’ve studied at CityLIS you are going to be prepared!

Digital Flânerie

Not much Flânerie this week as I was busy setting up my new computer.  Next week is Reading week so apart from heading to the British Library on Monday I’ll mostly be spending my week with my nose in a book, (or its digital equivalent), and thinking about upcoming assignments.

Image Credits

Featured image: Heading up through the bubbles by Saspotato. Source: Flickr. (CC BY-NC-SA 2.0)