Array Data Structures & Algorithms

Array Data Structures & Algorithms

Files as Containers and File Processing Files, I/O Streams, Sequential and Direct Access File Processing Techniques Outline Storage Devices Concept of File File Streams and Buffers Sequential Access Techniques Direct Access Techniques Storage Devices John von Neumann first expressed the architecture of the stored program digital computer. Computer System Computer Input Device Data CPU Bus

Main Memory Output Device Control Five Main Components: 1. CPU 2. Main Memory (RAM) 3. I/O Devices 4. Mass Storage 5. Interconnection network (Bus) Bus Bus Secondar y Storage Device Storage Devices Most of our previous discussions have been centred on how the C language supports dealing with data in memory (RAM). How to declare and reference variables in a program (and the

actual data at run time) Expression of data in character string format (human centred) versus internal machine representations (machine centred) Data types Variables Aggregate data structures (eg. arrays, structs, unions, bit strings) Concepts and techniques of memory addressing Using pointers Direct access versus indirect access (dereferencing of a pointer) Now we turn our attention to concepts and techniques of files and file processing on mass storage devices We begin with the concept of a file. Concept of File The concept of a file derives from its use in business, government and other areas A folder containing multiple pieces of paper (or tape, film, etc), called records, containing information presented in differing ways A digital file retains the same conceptual characteristics

Aggregates of data of differing data types and representations Requires standardized structures for packaging and communicating data File devices are any suitable hardware that supports file processing techniques stdin and stdout utilize default devices, as does stderr Each of stdin/stdout/stderr is actually a pointer to a struct File processing is implemented through the operating system (O/S) as an intermediator Processing functions include opening, closing, seeking, reading, writing Access techniques to files fall into two general categories Sequential access usually variable length records Direct access must be fixed length records Concept of File We will adopt a logical perspective of a file. This is a simplified model based on assumptions It permits us to ignore many low-level details Variable length records Sequential Access File: File offset (Unpredictable)

Direct Access File: Fixed length records File offset (Predictable) = RecNum * RecLength File Streams and Buffers File Streams The and cost Buffers of I/O: Brief !! Typical input or output on most 1. Program send operations YourFile data devices require 1/1000s to transaction messageoftoseconds O/S complete. This is thousands, to millions,

2. O/S point to device API, allocate of times slower than memory or cpu I/O buffer based operations. Complicated file 3. O/S schemes send protocol wrappedand access (organizations message devicebeing developed algorithms) are to always 4. Device respond with message to speed up programs and reduce access directedtimes to proper I/O buffer

to data. 5. O/S move message to Program buffer(s) 6. Program process message data O/S 3 2 APIs I/O Buffers 5 6 User Program YourFile Executable logic Variables, Structures 4 1 Making and Breaking File Connections

When a program is loaded into RAM, the O/S is provided with Study Figure 11.4 in textbook. information about thethe default file system (stdin and stdout) to be File Control Block (FCB) It discusses the relationship between files on storage devices will be used and also whether additional FILE pointers, FILE structures and File File Name String needed

Control Blocks (FCB), and the Operating Note that stdin normally points at the keyboard, while stdout points at the System. File Offset (Bytes) monitor These be and modified to are referjust to specific files, using file redirection Note thatcan stdin stdout Access Mode (R,W,B,+) cmdline% FILE* pointers. a.out < Infile.dat > Outfile.dat In order to communicate with a file it is necessary, first, to open a . channel to the device where the file is located (or will be located, once created). When the program is finished with the file, it is

necessary to close the channel. All required functions are defined in All required information concerning the file attributes (characteristics) is contained in a C-defined data structure called FILE. FILE * filePtr ; // pointer to struct that will hold file attributes There can be many files opened at the same time, each using its own FILE structure and file pointer. Making and Breaking File Connections In order to communicate with a file it is necessary, first, to open a channel to the device where the file is located (or will be located, once created). When theEnd-of-File program is finished with the file, it is necessary to close the channel. Channels may be re-opened Differentand O/Ss use multiple times closed, different codes to to indicate A FILE pointer may

be re-assigned different files the EOF. Assuming the declaration: FILE * cfPtr1,Linux/Unix cfPtr2 ; //- declare dtwo C file pointers To open a file channel Windows - z cfPtr1 = fopen( MyNewFileName.dat, w ) ; // open for writing cfPtr2 = fopen( MyOldFileName.dat, r ) ; // open for reading To close a file channel fclose( cfPtr1 ) ; fclose( cfPtr2 ) ; Every file contains an end-of-file indicator that the O/S can detect and report. This is shown with an example while( ! feof( cfPtr1 ) ) printf( More data to deal with\n ) ;

Making and Breaking File Connections In the previous slide we saw the statements cfPtr1 = fopen( MyNewFileName.dat, w ) ; // open for writing cfPtr2 = fopen( MyOldFileName.dat, r ) ; // open for reading File access attributes are used to tell the operating system (and the background file handling system) what kind of file processing is intended by the program C supports three types of sequential file transactions, called modes Read (with fscanf) Write (with fprintf) Append There are combinations of these as well, using + r+ w+ a+ Later we will discuss one more mode binary (b) Making and Breaking File Connections Mode Description

r Open an existing file for reading only w Create a file for writing only. If the file currently exists, destroy its contents before writing to it. a Open an existing file or create a file for writing at the end of the file. r+ Open an existing file for update, including both reading and writing. w+ Create a file for update use (reading and writing). If the file already exists, destroy its current contents before writing. a+

Append: Open or create a file for update writing is done at the end of the file. Sequential Access Techniques Writing to a sequential file fprintf( cfPtr, FormatString [, Parameter list] ) ; Example: fprintf( cfPtr, %d %lf\n, intSum, floatAve ) ; fprintf( cfPtr, This a message string, no values\n ) ; Reading from a sequential file fscanf( cfPtr, FormatString [, Parameter list] ) ; Example: fscanf( cfPtr, %d%lf, &intSum, &floatAve ) ; fscanf( cfPtr, %s, stringVar ) ; Interpreting return values fopen NULL means no file exists fprintf returns number of parameters outputted, or failure of operation fscanf returns number of parameters inputted, or failure of operation feof returns 0 if EOF found, otherwise non-zero. Sequential Access Techniques There are two ways of re-reading a sequential file Close the file and then re-open it

considered quite inefficient Rewind the file to the beginning (reset the file offset value in the FCB) while leaving it open rewind( cfPtr ) ; Before moving on it should be noted that most files that contain character based data alone have variable record length, hence sequential access is the only kind of access that makes sense However, any file (including those with fixed length records) can be accessed sequentially. Direct Access Techniques Direct Access Techniques are also called Random Access techniques Random just means that a read or write operation can be performed directly at the position (within the file) desired As with the case of array data structures, direct access can

be performed at constant cost (almost!) By contrast, sequential access implies that we may need to move through multiple records before we finally arrive at the file position desired. Making and Breaking File Connections We now consider the statements cfPtr1 = fopen( MyNewFileName.dat, wb ) ; // open for writing cfPtr2 = fopen( MyOldFileName.dat, rb ) ; // open for reading C supports three types of fixed length file transactions, called binary modes Read binary Write binary Append binary There are combinations of these as well, using + rb+ wb+ ab+ The term binary refers to a bit-level machine representation of data (ie. not characters, necessarily) Ex. unsigned and signed binary, IEEE float and double, etc. Making and Breaking File Connections Mode

Description (all files are binary) rb Open an existing file for reading only wb Create a file for writing only. If the file currently exists, destroy its contents before writing to it. ab Open an existing file or create a file for writing at the end of the file. rb+ Open an existing file for update, including both reading and writing. wb+ Create a file for update use. If the file already exists, destroy its current contents before writing.

ab+ Append: Open or create a file for update writing is done at the end of the file. x C11 has recently introduced the write exclusive mode as well. We will not discuss or examine this but students should read about it. Direct Access Techniques Writing to a direct access file fwrite( &DataStruct, sizeof( DS_t ), NumRecs, cfPtr ) ; Reading from a direct access file fread( &DataStruct, sizeof( DS_t ), NumRecs, cfPtr ) ; Seeking a record in a direct access file int fseek( FILE * cfPtr, long int Offset, int Whence ) ; Offset just refers to sizeof( DS_t ) Whence is one of three standard values (defined in ) SEEK_SET - seek based on offset from beginning of file

SEEK_CUR seek based on relative offset from current file position SEEK_END - seek based on offset from end of file Concept of Direct Access File Direct Access File with Fixed length records: Absolute Record Offset Number Begin File Current position N-1 N-2 . . . . 3 2 1

0 From BEGIN : RecNum * RecLength End File From END : (N - 1 - NumRecs) * RecLength NumRecs * RecLength + Relative offset - Direct Access Techniques #include Example: Writing to a direct access file struct rec_t { int ID ; char Name[50] ; double Score ; }

int main( ) { FILE * cfPtr ; struct rec_t Rec ; // Assume 1 <= ID <= 100 cfPtr = fopen( Score.dat, w ) ; while( scanf( %d, &Rec.ID) != EOF ) { scanf( %s%lf, Rec.Name, &Rec.Score ) ; fseek( cfPtr, (Rec.ID 1)*sizeof( struct rec_t ), SEEK_SET ) ; fwrite( &Rec, sizeof( struct rec_t ), 1, cfPtr ) ; } return 0 ; } Direct Access Techniques Checking for errors fwrite Returns the number of items outputted. If this number is less than the 3d argument, then an error has occurred fread Returns the number of data items successfully inputted, or EOF fseek Returns a non-zero value if the seek cannot be performed correctly Direct Access Techniques

Some additional problems to consider: Sort a file by a special value (called a key) Merge two files into a single file, maintaining sorted order Store blocks of memory (RAM) to a file, then recover it later into memory (concept of virtual memory management) Develop a hierarchical technique for accessing files based on organizational patterns. Example: Index Sequential Access techniques Develop your own (simple) database system involving multiple files, all linked through index (ie. key) values. Many of these problems and techniques will be discussed more deeply in future Computer Science courses. Summary C File Processing, Files, I/O Streams, Sequential and Direct Access File Processing Techniques Topic Summary Storage Devices Concept of File File Streams and Buffers

Sequential Access Techniques Direct Access Techniques Study examples Adapt them to your own uses ! Study Chapter 11: File Processing Moving beyond RAM to include data on persistent storage in the file system. Reading Chapter 12: Data Structures Abstract data structures, dynamic memory allocation, using pointers and self-referential data structures, linked lists. Review Begin reviewing and preparing for Final Exam !

Recently Viewed Presentations

  • The Right to A Job, the Right Types of Projects

    The Right to A Job, the Right Types of Projects

    El impacto de la crisis económica mundial: Una perspectiva de género Rania Antonopoulos Quito, 26-27 Noviembre 2009 Foro Internacional "LA CRISIS MUNDIAL Y ECUADOR: CARACTERISTICAS, CONSECUENCIAS, OPORTUNIDADES Y DIMENSIONES DE GÉNERO"; MINISTERIO DE FINANZAS, UNIFEM, CI-GENERO,FLACSO AC DEMOCRACIA, ILDIS
  • HUMANISMO - Universidad Veracruzana

    HUMANISMO - Universidad Veracruzana

    Rogers ha extrapolado de experiencia e investigación en asesoria psicológica y psicoterapia y ha propuesto un nuevo enfoque para la educación, el cual concibe la enseñanza como una relación interpersonal facilitante, en la cual el facilitador se caracteriza por tres...
  • Rob Fergus Computer Vision 1. Object Category Recognition

    Rob Fergus Computer Vision 1. Object Category Recognition

    Rob Fergus Computational Photography Removing camera shake from a single image Image and Depth from a Conventional Camera with a Coded Aperture Sharp image Depth Map Captured image Lens with pattern in aperture + 2. 1. Original Output Original Output...
  • Product Selection Process:How to Achieve the Department of ...

    Product Selection Process:How to Achieve the Department of ...

    Product Selection Process:How to Achieve the Department of Defense Menu Standards. ... Practical food and menu guidelines to assist dining facility managers in developing menus that meet recommended nutrient intakes as prescribed by current nutrient standards . ... Model "fit"...
  • TRAUMA ASSESSMENT - Kudzu

    TRAUMA ASSESSMENT - Kudzu

    TRAUMA CASUALTY ASSESSMENT RIFLES LIFESAVERS Tactical Combat Casualty Care Care Under Fire "The best medicine on any battlefield is fire superiority" Minimal casualty assessment Immediate treatment of extremity hemorrhage Tactical Field Care Rapid trauma assessment and immediate treatment of life-threatening...
  • Collision Theory and Reaction Rate - Laurel County

    Collision Theory and Reaction Rate - Laurel County

    Collision Theory and Reaction Rate Notes Rates of Chemical Reactions The amount of time it takes for a chemical reaction to come to completion is called the reaction rate Reaction rates are different for different reactions They can range from...
  • Public Opinion and Political Socialization 1 0 Zhang

    Public Opinion and Political Socialization 1 0 Zhang

    Bettmann/Corbis. Is polling always accurate? 10.1. Not only did advance polls in 1948 predict the Republican nominee Thomas E. Dewey would defeat Democratic incumbent President Harry S Truman, but on the basis of early and incomplete vote tallies, some newspapers'...
  • Elements of Music - Ms. Dowling&#x27;s Website - Home

    Elements of Music - Ms. Dowling's Website - Home

    Elements of Music Colour or Mood Colour or Mood Colour is a fancy way to talk about the emotion or feeling. In music, the combination of all 10 other musical elements can make our brains perceive a piece of music...