Workflow
for file transformations
After: Nicholas Thieberger (2004) Documentation in practice: developing
a linked media corpus of South Efate. In Peter K. Austin (ed.) Language Documentation and Description,
Vol 2, 169-178. London: SOAS. book contents and
ordering information
Key: The above tries to (but does not fully) follow standard
flowchart symbols, whereby a diamond is a decision, a
rectangle is a process, etc.
Compare the DoBeS
Workflow Diagrams linked from Technical
Framework for the DOBES Main Phase.
Workflow
for audio file management in a field methods course
- record digital file — the Marantz
PMD670 records a BWF with minimal
header at 16bit 48kHz stereo (what? n1); if
a
24bit recorder becomes available it is preferable
- copy to computer HD by USB or Firewire
- rename:
- replace .bwf
suffix with .wav (NB: on
a Mac, a
file can be renamed in the Get Info window)
- assign a PI (permanent identifier) base name, e.g. KY1-20050713-01 (what's this
mean? n5), and so rename
to PI.wav
- add metadata record to spreadsheet catalogue (where?
n6) of project
recordings (OR, soon, add an entry to the Paradisec online
work catalogue)
- copy file PI.wav
(from 3 above) and the spreadsheet catalogue entry (from 4 above) to
the class resource storage (volume LSA301 on LSA Server lsa.dlp.mit.edu)
- convert file format on computer
- to 16bit 44.1kHz stereo (audio CD format), using QuickTime Pro
— this creates a file of size 10MB per minute — and name this file PI_CD.wav
- to MP3 format at 128 kb/sec (kbps)
— this creates a file of size <1MB per minute with .mp3 suffix (on a Mac, use iTunes (how? n2))
- burn a CD (how? n3)
- an audio CD, if required, from PI_CD.wav
- a data CD, with both PI.wav
and PI.mp3
- the PI.wav or PI.mp3 (or PI_CD.wav) file can now be
used for
transcription; e.g. Transcriber creates a companion file PI.trs
- when appropriate batches are ready, notify Paradisec (who? n4)
- after processing at Paradisec, the copy of the PI.wav
file with added metadata header information via imp.xml becomes the
preservation master file,
in BWF
Notes
1. Audio
file formats (Wikipedia)
2. How to use iTunes to make an MP3 file
- Set up iTunes, before dropping a file in the iTunes
window: in
Preferences, Importing, choose 'MP3 encoder';
"128k bit [per second] has become the de facto "good enough" standard" [source] NB: avoid variable
bite rate (VBR)
encoder (which is an option available under 'Custom...')
- Drag and drop the input file onto the iTunes window
- Select file, then under the Advanced menu item choose
'Convert
Selection
to MP3'
3. How to burn a CD
- To create a data CD, on a Mac, insert a blank CD (or DVD
if the
machine has a SuperDrive), then drag the files onto it, then click on
the "burn" symbol (a yellow and black sectored circle)
- To create an audio CD, drag and drop the audio file into
the
iTunes window, and use the Burn command within iTunes
4. Paradisec
is located at the University of Sydney and is the primary archive for
recordings of Pacific-area languages.
5. File-naming convention: name base consists of
three consecutive parts separated by hyphen (-):
- collection identifier (e.g. KY1)
- date (in yyyymmdd format)
- session identifier (e.g. 01,
or identifier of particular interviewer(s), or analogue tape side A or B)
NB: Within the KY1 collection,
we propose to adopt the convention that a session identifier refers to
interviewers and the sequence number of those interviewers' session.
Use a hyphen only between parts. A period (.) is only used at the end
of the base name before a suffix.
A names has three parts: the first two constitute a unique identifier
for an "item" and the third completes the filename. We use the
date for the second part.
Example: KY1-20050713-ALL01 names a recording in the
KY1 collection, recorded
on 13 July 2005, in session ALL01
(which means the interviewer(s) were ALL and it was the 01 session by those
interviewers).
An MP3 format version of this recording is KY1-20050713-ALL01.mp3. And KY1-20050721-DP03.wav names
the file in WAV format recorded on 21 July 2005 in the session DP03, i.e. the 03 session by interviewer(s) DP.
See Filenaming
conventions in PARADISEC
information for depositors.
6. A spreadsheet template is available at Paradisec's download page.
This requires Microsoft Excel. If you do not have access to
Microsoft Office, have a look at this text
version; or install OpenOffice
Send amendments to David.Nash
AT
anu.edu.au
Created 13 April 2005
Modifed 17 July 2005
URL http://www.anu.edu.au/linguistics/nash/LSA.301/flow.html