Levels of Achievement

 

** All captured images must be in an uncompressed, lossless format with a minimum resolution of 5760 x 3840 pixels.

1st Level of Achievement (Specimen Imaging):

1.1 A single dorsal image of each specimen is captured with a minimum resolution of 5760 x 3840 pixels. The entire dorsal surface of the specimen is visible in the image and is in sharp focus. The image is not obscured by other specimens or objects other than the pin.
1.2 A single lateral image of each specimen is captured with a minimum resolution of 5760 x 3840 pixels. The entire lateral surface of the specimen is visible in the image and is in sharp focus. The image is not obscured by other specimens or objects.
1.3 A single lateral image of each specimen and the arrangement of any labels and capsules that are attached to the pin and/or associated with the specimen is captured with a minimum resolution of 5760 x 3840 pixels.
1.4 All captured specimen images must be traceable to ensure proper identification of images from the same specimen.
1.5 The system is able to successfully navigate overlapping specimens (Tray 4, specimens 3 and 5 will serve as the example), such as with a warning system indicating that human intervention may be needed at a later date, software that is able to recognize and digitally correct overlapping sections, or otherwise address the issue of overlapping specimens.

The Entrant's system may capture more than the above, required images. However, all the above images are required to satisfy the level of achievement.

2nd Level of Achievement (Enhanced Specimen Imaging):

2.1 All "1st Level of Achievement" criteria are met.
2.2 Each specimen image contains a small color reference chart (e.g., Calibr8 Digital Color Chart SG - Extra Small 2.4" x 1.6")
2.3 Each specimen image contains an algorithmically-generated or physical size reference scale.
2.4 For all specimens, the resolution must be sufficient to produce a sharp image at 10x magnification when viewed on a screen.
2.5 Each specimen image must contain minimal white space. No more than 25% of the image may contain white space, including the frame space filled by the specimen, the color reference chart and the reference scale.

3rd Level of Achievement (Label Imaging):

3.1 All "2nd Level of Achievement" criteria are met.
3.2 The size reference scale is generated or selected dynamically to ensure that it is no larger than 2x the specimen's longest dimension. Units are in centimeters and millimeters.
3.3 All labels affixed to the pin are captured. The data on the final image to be used for optical character recognition (OCR) are not obscured. Label capture includes the ability to fully capture images of labels that are touching, upside-down, or double-sided.
3.4 Each label image should meet the minimum sufficient resolution for OCR. The "x"- height of a captured label image is the height of a lower case "x" within the label. An "x"-height greater than or equal to 20 pixels is required.
3.5 All captured label and specimen images must be traceable to ensure proper identification of label and specimen images that are from the same specimen.

4th Level of Achievement (Label OCR Capture):

4.1 All "3rd Level of Achievement" criteria are met.
4.2 OCR will be executed on all captured label data for each specimen.
4.3 Each specimen's data will be inserted as a single record into a database of any type. Raw text generated by OCR will be inserted into text fields associated with the specimen, with a separate field for each label captured during the imaging process. For purposes of this Competition, the database must be hosted online, for ease of access for all of the judges.
4.4 Specimen and label image files will also be stored in the database with each specimen record.
4.5 Ancillary captured data from the unit tray, such as images of unit tray labels not associated with a particular specimen (frequently the genus and/or species of the specimens in the tray) will be imaged, processed through OCR, and inserted into the database and for each specimen within that unit tray.
4.6 Type-written labels written in English must be translated in the database with an accuracy of 90%.
4.7 A specimen with labels containing any hand-written elements that are not recognized by OCR must be flagged for later human intervention within that specimen's record in the database.

5th Level of Achievement (OCR Data Parsing and Natural Language Processing):

5.1 All "4th Level of Achievement" criteria are met.
5.2 Specimen data obtained through OCR will be inserted into the database after being parsed into appropriate fields (rather than as a single text field). Fields will correspond to the Darwin Core schema (reference http://rs.tdwg.org/dwc/terms/#theterms). The minimum Darwin Core field set is listed below (elements may be left blank if they are not present on the specimen label):
  • scientificName
  • genus
  • specificEpithet
  • recordedBy: Collector name(s) (people, groups, or organizations)
  • verbatimEventDate: Date specimen was collected from the field
  • verbatimLocality: Location information (typically includes distance and direction from a town, the county, the state, and the country if outside the United States)
  • verbatimLatitude
  • verbatimLongitude
  • catalogNumber: Bar code number or identification id for the specimen
  • institutionCode: Institution name or code where the specimen is held
  • collectionCode: Collection name or code to which the specimen belongs.
5.3 Data elements present on the label(s) other than those identified above must be parsed and stored in a single text field within the database to denote that human intervention is required.
5.4 A minimum of 60% of parsed data (60% accuracy over all of the data for a drawer) will be accurately assigned to the appropriate Darwin Core field.
5.5 A maximum of three minutes is encouraged for the digitization of each specimen and associated labels and capsules. Translation of label data into database elements via OCR, data parsing, and natural language processing may occur automatically during post- processing without any time limitation, except that it must be completed prior to submission and, if selected as a finalist, must be completed in time to be evaluated in both the Phase 1 and Phase 2 on-site demonstrations.
5.6 Metadata associated with the digitization process are captured in the database and include the following:
  • A drawer identifier
  • A timestamp showing when the data capture process began
  • A timestamp showing when the data capture process was completed (when the drawer can be safely removed)
  • A specimen counter indicating how many records are in the database for this drawer
  • An operator identifier indicating who was overseeing the digitization technology

6th Level of Achievement (Data Presentation):

6.1 All "5th Level of Achievement" criteria are met.
6.2 All specimen, label and ancillary images are presented via a Web interface on a single Web page.
6.3 Raw text generated by OCR is also presented on the page, with data parsed into the appropriate Darwin Core fields.
6.4 Edit capabilities to enable expert correction of raw text generated by OCR will be provided for each data field.
6.5 Drag-and-drop capabilities will be provided to enable text to be dragged into the appropriate Darwin Core field by an expert reviewer.

For full contest rules, visit the Complete Contest Rules and Materials section.

 

 

GET NOTIFIED  Sign up to get notified about the competition.

National Science Foundation Logo    American Institute of Biological Scienes Logo