Workflow for file transformations


workflow chart
After: Nicholas Thieberger (2004) Documentation in practice: developing a linked media corpus of South Efate. In Peter K. Austin (ed.) Language Documentation and Description, Vol 2, 169-178. London: SOAS.  book contents and ordering information

Key: The above tries to (but does not fully) follow standard flowchart symbols, whereby a diamond is a decision, a rectangle is a process, etc.

Compare the DoBeS Workflow Diagrams linked from Technical Framework for the DOBES Main Phase.

Workflow for audio file management in a field methods course

  1. record digital file — the Marantz PMD670 records a BWF with minimal header at 16bit 48kHz stereo (what? n1); if a 24bit recorder becomes available it is preferable
  2. copy to computer HD by USB or Firewire
  3. rename:
    1. replace .bwf suffix with .wav (NB: on a Mac, a file can be renamed in the Get Info window)
    2. assign a PI (permanent identifier) base name, e.g. KY1-20050713-01 (what's this mean? n5), and so rename to PI.wav
  4. add metadata record to spreadsheet catalogue (where? n6) of project recordings (OR, soon, add an entry to the Paradisec online work catalogue)
  5. copy file PI.wav (from 3 above) and the spreadsheet catalogue entry (from 4 above) to the class resource storage (volume LSA301 on LSA Server lsa.dlp.mit.edu)
  6. convert file format on computer
    1. to 16bit 44.1kHz stereo (audio CD format), using QuickTime Pro — this creates a file of size 10MB per minute — and name this file PI_CD.wav
    2. to MP3 format at 128 kb/sec (kbps) — this creates a file of size <1MB per minute with .mp3 suffix (on a Mac, use iTunes (how? n2))
  7. burn a CD (how? n3)
    1. an audio CD, if required, from PI_CD.wav
    2. a data CD, with both PI.wav and PI.mp3
  8. the PI.wav or PI.mp3 (or PI_CD.wav) file can now be used for transcription; e.g. Transcriber creates a companion file PI.trs
  9. when appropriate batches are ready, notify Paradisec (who? n4)
  10. after processing at Paradisec, the copy of the PI.wav file with added metadata header information via imp.xml becomes the preservation master file, in BWF

Notes

1. Audio file formats (Wikipedia)

2. How to use iTunes to make an MP3 file
  1. Set up iTunes, before dropping a file in the iTunes window: in Preferences, Importing, choose 'MP3 encoder'; "128k bit [per second] has become the de facto "good enough" standard" [source] NB: avoid variable bite rate (VBR) encoder (which is an option available under 'Custom...')
  2. Drag and drop the input file onto the iTunes window
  3. Select file, then under the Advanced menu item choose 'Convert Selection to MP3'
3. How to burn a CD
  1. To create a data CD, on a Mac, insert a blank CD (or DVD if the machine has a SuperDrive), then drag the files onto it, then click on the "burn" symbol (a yellow and black sectored circle)
  2. To create an audio CD, drag and drop the audio file into the iTunes window, and use the Burn command within iTunes
4. Paradisec is located at the University of Sydney and is the primary archive for recordings of Pacific-area languages.

5. File-naming convention: name base consists of three consecutive parts separated by hyphen (-):
  1. collection identifier (e.g. KY1)
  2. date (in yyyymmdd format)
  3. session identifier (e.g. 01, or identifier of particular interviewer(s), or analogue tape side A or B)
    NB: Within the KY1 collection, we propose to adopt the convention that a session identifier refers to interviewers and the sequence number of those interviewers' session.
Use a hyphen only between parts. A period (.) is only used at the end of the base name before a suffix.
A names has three parts: the first two constitute a unique identifier for an "item" and the third completes the filename.  We use the date for the second part.

Example: KY1-20050713-ALL01 names a recording in the KY1 collection, recorded on 13 July 2005, in session ALL01 (which means the interviewer(s) were ALL and it was the 01 session by those interviewers).  An MP3 format version of this recording is KY1-20050713-ALL01.mp3. And KY1-20050721-DP03.wav names the file in WAV format recorded on 21 July 2005 in the session DP03, i.e. the 03 session by interviewer(s) DP.

See Filenaming conventions in PARADISEC information for depositors.

6. A spreadsheet template is available at Paradisec's download page.  This requires Microsoft Excel.  If you do not have access to Microsoft Office, have a look at this text version; or install OpenOffice


Send amendments to  David.Nash AT anu.edu.au

Created 13 April 2005
Modifed 17 July 2005
URL http://www.anu.edu.au/linguistics/nash/LSA.301/flow.html