A machine vision system gives a production line the ability to see, measure, and decide. It combines lighting, a lens, an image sensor, a processor, and inspection software to turn a scene into a pass or fail verdict, a measurement, or a position for a robot, all without human eyes. Across manufacturing it performs automated inspection, dimensional gauging, barcode and character reading, and robot guidance at line speed and around the clock.
The category spans three architectures of rising capability: the fixed-function vision sensor, the all-in-one smart camera, and the multi-camera PC-based system. This guide treats them together, then unpacks the camera interface standards, image sensors, lighting and optics, and the specifications that actually drive a selection decision.
Photo: Ginkgomaier, CC BY-SA 4.0, via Wikimedia Commons
This guide is written for industrial purchasing engineers and design engineers. It covers 6 chapters from system architectures, camera interface standards, image sensors, lighting and optics, to spec-sheet decoding and selection decisions, with 7 selection FAQs and manufacturer comparisons. Interface and component standards referenced include EMVA GenICam and EMVA 1288, and the AIA (A3) GigE Vision, USB3 Vision, Camera Link, and CoaXPress (JIIA) standards.
Chapter 1 / 06
What is a Machine Vision System
A machine vision system is an integrated set of hardware and software that acquires images of a scene and automatically extracts useful information from them to drive an industrial decision. Its primary uses are automatic inspection and robot or process guidance: confirming that a part is present, correctly assembled, and free of defects, measuring a dimension, reading a barcode or printed text, or telling a robot exactly where to grip. Unlike a person glancing at a part, a vision system applies the identical decision criteria to every unit, runs continuously without fatigue, and produces a traceable digital record of each result.
Functionally a complete system has six building blocks. The lighting illuminates the part so the feature of interest stands out from its background. The lens forms an image of that feature on the sensor and sets the field of view and resolution. The image sensor, a CCD or CMOS chip, converts the focused light into a digital image. The processor, which may be an embedded SoC inside a smart camera or a separate industrial PC, runs the vision algorithms. The communication interface reports the result to a PLC or line controller over digital I/O or fieldbus. Finally, the inspection software ties the chain together with tools for locating, measuring, reading, and classifying. In a PC-based design a frame grabber sits between camera and computer to capture frames and offload acquisition.
The discipline grew out of factory automation in the 1980s, when dedicated image-processing hardware first read characters and checked part presence. Three later shifts shaped today's market. First, the transition from analog CCD to digital CMOS sensors, accelerated by Sony's Pregius global-shutter family, brought high frame rates and low noise at commodity prices. Second, the standardization of camera interfaces and the GenICam programming model, beginning in the early 2000s, let integrators mix cameras and software freely. Third, since roughly 2015, deep-learning tools added the ability to judge variable, hard-to-define defects that no fixed rule could encode, opening applications that classical algorithms could not reach.
The market reflects that breadth. Independent analyses put the global machine vision market at roughly 15 to 16 billion US dollars in 2024, growing at a high-single-digit annual rate toward the high-20-billion range by the early 2030s, driven by quality automation, semiconductor inspection, and AI-enabled applications. Japan's Keyence and the United States' Cognex together held about a quarter of the 2024 market, with Teledyne, Basler, Omron, and others filling component and integrated-system niches. No single product covers the whole span, so engineering selection is the act of mapping a specific inspection task to the right architecture, sensor, optics, and software.
Four engineering questions decide whether a vision project succeeds. Can the optics and lighting reliably expose the feature of interest with enough contrast? Does the sensor resolution put enough pixels across the smallest defect? Does the processing keep up with line speed? And does the software make a stable decision across the full range of acceptable parts. A project that fails usually fails on the first two, the physical front end, long before any algorithm is tuned, which is why experienced integrators spend their first effort on lighting and lensing.
Chapter 2 / 06
System Architectures and Types
Machine vision products fall into three architectures of rising flexibility and cost: the vision sensor, the smart camera, and the PC-based system. They share the same physics but differ in how much you can program and how many viewpoints they can manage. Choosing the wrong tier is a common and expensive mistake: a fixed vision sensor cannot grow into a complex multi-feature task, while a full PC-based rig is overkill for a single go/no-go check. The table below compares the three.
Architecture
Processing
Cameras
Flexibility
Typical Applications
Vision sensor
Embedded, fixed tools
1
Low (few clicks)
Presence/absence, single barcode, one pattern
Smart camera
Embedded SoC, programmable
1
Medium (tool library)
Multi-feature single-station inspection, OCR
PC-based system
Industrial PC, full software
1 to 16+
High (open)
Multi-camera, line-scan, deep learning, gauging
Vision sensors are the entry tier. They package sensor, optics, light, and a fixed library into one rugged unit configured through a simple interface in minutes. Their performance envelope is intentionally limited to one well-defined job, such as confirming a cap is present, checking a label, or reading a single code. When the task is quick, standardized, and unlikely to change, a vision sensor is the most cost-effective and reliable choice. Keyence's IV3 series is a representative product. The trade-off is rigidity: a vision sensor cannot be reprogrammed for a substantially different inspection.
Smart cameras integrate image acquisition, processing, decision logic, and I/O into a single housing but expose a far richer, programmable tool set than a vision sensor, configured today through a graphical interface rather than hand-written C or C++. One smart camera handles multiple features from a single viewpoint, runs measurement, reading, and pattern tools, and increasingly embeds deep-learning inference on the device. Cognex In-Sight and Keyence CV-X are representative families. Smart cameras suit flexible, single-station inspection where compactness and self-contained operation matter, at a cost between a vision sensor and a PC-based system.
PC-based systems separate the cameras, lighting, and a host industrial computer that manages every peripheral and runs the inspection software. This architecture offers the greatest flexibility and the highest performance: it ingests multiple high-resolution area-scan or line-scan cameras at once, applies the heaviest deep-learning workloads, and coordinates complex multi-station logic. It is the choice for very complex applications where many inspection tasks run fast and simultaneously on high-performance hardware. The cost is larger physical size, higher price, and more integration effort. Basler cameras with pylon software, paired with a frame grabber and a custom lens and light, are a common PC-based build.
Cutting across all three tiers is the 2D versus 3D distinction. A 2D system measures features in a plane: surface defects, print verification, code reading, and flat-part gauging. A 3D system reconstructs height, shape, or volume and is needed for weld and bead inspection, planarity and warpage, bin picking, and robust pose estimation. Mainstream 3D methods include laser triangulation (used by laser profilers), structured light, stereo vision, and time of flight, each trading resolution against speed and standoff distance. Because 3D resists the lighting and contrast variation that defeats 2D but costs more and runs slower, many lines pair a fast 2D cosmetic station with a 3D dimensional station.
A further architectural choice is area-scan versus line-scan imaging. An area-scan camera captures a full rectangular frame in one exposure and suits discrete parts that can be brought to rest or strobed. A line-scan camera captures a single row of pixels at very high rate and builds a continuous image as the product moves under it, which makes it the standard for web, sheet, and roll inspection of paper, film, textile, and metal coil. Line-scan rates reach hundreds of kilohertz, so they demand high-bandwidth interfaces, the subject of the next chapter.
Chapter 3 / 06
Camera Interface Standards
In a PC-based or component-level system, the camera interface decides how much image data can move, how far, and at what cost. Four families dominate: GigE Vision, USB3 Vision, Camera Link and Camera Link HS, and CoaXPress. They are not interchangeable; each balances bandwidth against cable length, power delivery, and whether a frame grabber is required. The table below compares the practical figures engineers use to size an interface.
Interface
Bandwidth
Max Cable
Power / Grabber
Best For
GigE Vision (1 Gbit/s)
~115 MB/s
100 m
PoE, no grabber
Multi-camera factory cells, long cable runs
10GigE Vision
~1,150 MB/s
100 m (copper)
No grabber
High-res over distance
USB3 Vision
~380 MB/s
3 to 5 m
Bus power, no grabber
Single-camera benchtop, OEM
Camera Link
up to ~850 MB/s
~10 m
Grabber required
Deterministic low-latency capture
Camera Link HS
10+ Gbit/s per lane
fiber, long
Grabber required
Extreme-bandwidth line-scan
CoaXPress (CXP-12)
12.5 Gbit/s per channel
30 to 40 m
Power over coax, grabber
High-res, high-speed line-scan
GigE Vision runs the camera protocol over standard Gigabit Ethernet, giving roughly 115 MB/s of usable image throughput over up to 100 m of low-cost Cat-6 cable, with Power over Ethernet able to feed the camera on the same wire. Its long reach, cheap cabling, and switch-based multi-camera fan-out make it the workhorse of distributed factory installations. The same physical cabling scales to 5GigE and 10GigE, lifting throughput toward roughly 580 and 1,150 MB/s respectively when a single camera outgrows 1 Gbit/s. GigE Vision is administered by AIA, now part of A3, the Association for Advancing Automation.
USB3 Vision uses SuperSpeed USB 3.0 to deliver about 380 MB/s with simple plug-and-play and bus power, but its passive cable is limited to roughly 3 to 5 m before active or fiber extension is needed. It is ideal for single-camera benchtop, laboratory, and OEM integration where the camera sits close to the host. Like GigE Vision, USB3 Vision is an AIA (A3) standard and needs no frame grabber.
Camera Link, defined by AIA in 2000, standardized the connection between camera and frame grabber for high-performance imaging. It provides deterministic, low-latency transfer up to about 850 MB/s in its 80-bit (Deca) configuration, 680 MB/s in Full, over roughly 10 m, which made it a long-time staple of high-speed inspection, though it always requires a frame grabber. Camera Link HS is its successor, carrying 10 Gbit/s and more per lane, typically over fiber for long reach, to feed the highest-bandwidth area-scan and line-scan cameras.
CoaXPress, a JIIA standard, sends high-speed data over ordinary 75-ohm coaxial cable while simultaneously delivering power and an uplink control channel on the same coax. The current CXP-12 generation reaches 12.5 Gbit/s per channel, about 1,250 MB/s, over 30 to 40 m, and channels can be aggregated for cameras that need several gigabytes per second. With its blend of very high bandwidth, useful cable length, and single-cable power, CoaXPress is the interface of choice for high-resolution sensors and line-scan cameras running at scan rates up to several hundred kilohertz. It requires a CoaXPress frame grabber.
Tying these interfaces together is GenICam, the Generic Interface for Cameras maintained by the European Machine Vision Association (EMVA). GenICam abstracts each camera behind a uniform programming model: GenApi reads an XML description of the camera's features, SFNC standardizes feature names, and GenTL provides a transport-layer producer that hides the physical link. Because GigE Vision, USB3 Vision, Camera Link, and CoaXPress all expose themselves through GenICam, one software stack can address compliant cameras across different interfaces with the same feature names, and the EMVA 1288 standard lets buyers compare sensor performance such as quantum efficiency and noise on a common, vendor-neutral basis.
Chapter 4 / 06
Image Sensors, Lighting and Optics
The image sensor, the lighting, and the lens form the physical front end of a vision system, and they determine whether the feature of interest is even visible before any software runs. Get this front end wrong and no algorithm can recover the missing contrast or resolution. This chapter covers the three components and the lighting geometries that expose different defect types.
Image sensor: CCD versus CMOS. The sensor converts focused light into a digital image, and the two technologies are the Charge-Coupled Device (CCD) and the Complementary Metal-Oxide-Semiconductor (CMOS) sensor. CCD historically offered uniform, low-noise images but reads out slowly and consumes more power. Modern CMOS sensors, led by Sony's Pregius global-shutter line, now match or exceed CCD on noise while reading out far faster and at lower cost and power, so CMOS has largely displaced CCD in machine vision. A second axis is shutter type: a global shutter exposes all pixels simultaneously and renders moving parts without distortion, while a rolling shutter exposes one line at a time and skews fast-moving objects. Any application with motion should specify a global-shutter sensor. The third axis is color: a monochrome sensor has no Bayer color filter, so every pixel captures all visible wavelengths and yields higher effective resolution and sensitivity, while a color sensor adds a filter array that absorbs unwanted wavelengths so each pixel is tuned to one color. Use monochrome unless color itself is the inspected feature.
Lighting is the single most decisive and most underrated element. It is chosen by the feature to be revealed, not by raw brightness, and locking it down before tuning software is the mark of an experienced integrator. The table below maps the five common geometries to what they expose.
Lighting
Geometry
Reveals
Typical Use
Backlight
Behind part, into camera
Sharp silhouette / outline
Edge gauging, presence, holes
Ring light
Around lens, front
General bright field
Flat, non-reflective parts
Dome (diffuse)
All angles, cloudy-day
Even light, no glare
Curved or specular surfaces
Coaxial / on-axis
Perpendicular, through-lens
Flat reflective detail
Engravings, flat metal, wafers
Dark-field
Low angle, 0 to 45 deg
Scratches, edges, embossing
Defects on smooth surfaces
Backlighting places an even, high-intensity source behind the part so it appears as a crisp black silhouette, which is the most reliable way to gauge outer dimensions, count holes, or verify presence. A ring light mounted around the lens gives convenient general front illumination for flat, low-reflectance parts. A dome light, the most diffuse source available, aims a ring of LEDs at a white diffuse interior so light reaches the part from a very wide range of angles, washing out the glare that ruins images of curved or shiny surfaces. Coaxial illumination, often projected through the lens itself, strikes a flat reflective surface perpendicularly to read engravings and inspect wafers and flat metal. Dark-field illumination orients light at a low 0 to 45 degree angle so a smooth surface reflects it away and appears dark, while scratches, edges, and embossing catch the light and stand out. Wavelength is a further lever: red and infrared raise contrast on certain materials and penetrate some films, while shorter blue wavelengths sharpen fine detail.
Optics. The lens captures and focuses light onto the sensor and sets image clarity, resolution, and the field of view, so its choice directly limits the smallest feature the system can detect. A conventional entocentric lens shows perspective: parts farther from the lens appear smaller, which introduces measurement error for parts with depth. A telecentric lens keeps magnification constant across its depth of field, eliminating that perspective error, which makes it the standard for precision dimensional gauging despite a larger size and higher cost. Key lens parameters are focal length, which together with working distance fixes the field of view, the sensor format the lens must cover (for example 1/1.8 inch, 2/3 inch, or 1.1 inch), and the maximum aperture, which trades light gathering against depth of field. Some telecentric and macro lenses integrate coaxial illumination directly, simplifying the optical path for flat reflective targets.
Chapter 5 / 06
Key Specification Parameters
A vision camera or system spec sheet can list dozens of parameters, but only a handful truly drive a selection decision. The seven below decide whether the system resolves the feature, keeps up with the line, and integrates with the controls. Each is explained so a purchasing engineer can read a datasheet critically.
Resolution is quoted in megapixels and, more usefully, in pixels across the field of view. The number that matters is spatial resolution: the field of view divided by the pixel count gives the real-world size each pixel represents. To detect a defect reliably, allow at least 3 to 5 pixels across its smallest dimension. For example, resolving a 0.1 mm feature across a 100 mm field with 3 pixels needs about 3,000 pixels across, which points to a 5 to 12 MP sensor depending on aspect ratio. Specifying resolution from the defect size up, rather than buying the biggest sensor available, avoids both blind under-resolution and wasted bandwidth.
Frame rate or, for line-scan cameras, line rate, is the number of images or lines captured per second, and it must exceed the line throughput with margin for processing. Area-scan rates range from tens to several hundred frames per second; line-scan rates reach hundreds of kilohertz. High frame rates demand both a fast global-shutter sensor and a high-bandwidth interface, which is why frame rate, resolution, and interface choice must be sized together rather than in isolation.
Sensor format and pixel size set the optical context. The format (such as 1/1.8 inch, 2/3 inch, or 1.1 inch) determines which lenses can cover the sensor without vignetting. Pixel size, often in the 2 to 5 micrometre range, trades light sensitivity against resolution: larger pixels gather more light and lower noise, while smaller pixels pack more resolution onto the same chip. Match the lens image circle to the sensor format, or the corners of the image will darken and blur.
Shutter type and exposure. As covered earlier, a global shutter is mandatory for moving parts; specify it explicitly. Minimum exposure time, paired with strobed lighting, determines how fast a part can move during capture without blur. Dynamic range, the ratio between the brightest and darkest signal a sensor captures in one frame, governs whether bright and dark regions are both readable, which matters for shiny metal next to dark cavities.
Spectral response describes the wavelengths the sensor detects, including near-infrared sensitivity, and must be matched to the chosen lighting wavelength. Interface and protocol cover both the camera interface (GigE Vision, USB3 Vision, Camera Link, CoaXPress) and the result-output protocol to the controller. The output side commonly includes:
Discrete digital I/O: simple pass/fail and trigger lines wired to a PLC, the most universal interface.
EtherNet/IP, PROFINET, EtherCAT: industrial fieldbuses that carry results and parameters to a line controller.
TCP/IP, Modbus, OPC UA, RS-232: data export to MES, databases, and supervisory systems.
Camera control via GenICam: standardized feature access for acquisition settings on PC-based systems.
Software and tool capability is the seventh and decisive specification. Confirm the system provides the inspection tools the task needs: locating and alignment, dimensional gauging, barcode and data-matrix reading, OCR, blob and edge analysis, and, where defects are variable, deep-learning classification and anomaly detection. For PC-based builds, libraries such as MVTec HALCON and Cognex VisionPro Deep Learning supply both classical and AI tools; for smart cameras, the embedded tool library defines the ceiling of what the device can ever do.
Chapter 6 / 06
Selection Decision Factors
To convert the previous five chapters into a specific system, follow the decision sequence below. Most selection failures come not from a single wrong part but from skipping the front-end physics and jumping straight to a camera or software. These eight steps double as a fixed RFQ template.
Define the inspection task and pass/fail criteria: State precisely what feature is judged, the smallest defect that must be caught, and the acceptable range of good parts. Decide whether the measurement lives in a plane (2D) or needs height, shape, or volume (3D). This single definition drives every later choice.
Solve lighting and optics first: Pick the lighting geometry and wavelength that expose the feature with maximum contrast (Chapter 4), then choose the lens, using a telecentric lens for precision gauging. Prove the feature is visible to the eye on a monitor before specifying a camera.
Size resolution from the defect: Compute pixels across the field of view to put 3 to 5 pixels on the smallest defect, then select sensor megapixels accordingly. Choose monochrome unless color is the inspected feature, and a global shutter for any moving part.
Match frame rate and interface to line speed: Confirm the camera captures fast enough for throughput with processing margin, then choose the interface (GigE Vision, USB3 Vision, Camera Link, CoaXPress) that carries that bandwidth over the required cable length, adding a frame grabber where the interface demands one.
Choose the architecture: Map the task to a vision sensor (simple, fixed go/no-go), a smart camera (single-station multi-feature), or a PC-based system (multi-camera, line-scan, high-speed, or heavy deep learning). Confirm GenICam, GigE Vision, or USB3 Vision compliance for component-level builds.
Specify software and tools: List the required tools (gauging, code reading, OCR, alignment, deep-learning defect detection) and verify the platform supplies them. Where defects are variable or amorphous, budget for labeled-image collection and a deep-learning tool such as HALCON or VisionPro Deep Learning.
Define integration and environment: Result-output protocol (digital I/O, EtherNet/IP, PROFINET, OPC UA), mounting and standoff, enclosure ingress rating for washdown or dust, ambient light shielding, and temperature and vibration on the line.
Evaluate total cost of ownership: Camera and lens and lighting, frame grabber and PC, software licenses, integration labor, calibration, and the cost of false rejects or escapes. A cheaper front end that cannot hold contrast across the part range generates scrap and downtime that quickly exceed the initial saving.
One last commonly overlooked dimension is manufacturer serviceability and longevity: local application-engineering and calibration support, spare-part and replacement-camera availability over a multi-year line life, GenICam and interface-standard compliance that allows a future camera swap, and software upgrade and retraining paths for deep-learning models as the product mix evolves. Cognex, Keyence, Basler, Teledyne, Omron, SICK, Allied Vision, and MVTec have established support and supply networks in major manufacturing regions, making them dependable choices for production lines expected to run for many years.
FAQ
What is the difference between a vision sensor, a smart camera, and a PC-based vision system?
All three capture an image and make a decision, but they differ in flexibility. A vision sensor has a fixed, limited tool set for one job (presence/absence, a barcode, a single pattern) and is configured with a few clicks. A smart camera integrates sensor, optics, processor, and I/O in one housing and runs a richer programmable tool library, so it suits multi-feature inspection on one viewpoint. A PC-based system separates cameras, lighting, and a host computer; it handles multiple high-resolution or line-scan cameras, the heaviest deep-learning workloads, and complex multi-station logic. Rule of thumb: simple go/no-go to a vision sensor, single-station multi-tool to a smart camera, multi-camera or high-speed or heavy compute to a PC-based system.
Which camera interface should I choose: GigE Vision, USB3 Vision, Camera Link, or CoaXPress?
Match interface to bandwidth and cable length. GigE Vision delivers about 115 MB/s (1 Gbit/s) over up to 100 m of low-cost Cat-6 cable and supports Power over Ethernet, making it the default for multi-camera factory cells; 5GigE and 10GigE scale the same cabling to roughly 580 and 1,150 MB/s. USB3 Vision gives about 380 MB/s with bus power but only up to 3 to 5 m passive cable. Camera Link offers deterministic, low-latency transfer to about 850 MB/s over roughly 10 m and needs a frame grabber. CoaXPress (CXP-12) reaches 12.5 Gbit/s per coax channel (about 1,250 MB/s) over 30 to 40 m and powers the camera through the coax, which suits high-resolution and high-speed line-scan work. Camera Link HS extends to 10 Gbit/s+ per lane over fiber for the most demanding jobs.
What is GenICam and why does interface standardization matter?
GenICam (Generic Interface for Cameras) is a standard, maintained by the European Machine Vision Association (EMVA), that gives every compliant camera a uniform programming interface regardless of physical link. Its modules include GenApi (feature description via an XML file), SFNC (Standard Features Naming Convention), and GenTL (the transport-layer producer that abstracts the hardware). GigE Vision and USB3 Vision, both administered by AIA (now part of A3, the Association for Advancing Automation), sit on top of GenICam. The practical payoff: one software stack can address a Basler, Teledyne, or other compliant camera over different interfaces with the same feature names, so you can swap hardware without rewriting acquisition code, and EMVA 1288 lets you compare sensor performance on a common test basis.
How do I choose a machine vision lighting technique?
Lighting is selected by the feature you need to expose, not by brightness. Backlighting places the source behind the part and renders a sharp silhouette, ideal for outer-edge gauging and presence checks. A ring light mounted around the lens gives bright, general front illumination for flat, non-reflective parts. A dome (cloudy-day) light floods the field from all angles to wash out glare on curved or specular surfaces. Coaxial (on-axis) light projects perpendicular to a flat reflective surface to read engravings and flat metal. Dark-field light at a low 0 to 45 degree angle catches only scratches, edges, and embossing, leaving the smooth background dark. Choosing wavelength matters too: red and infrared improve contrast and penetrate some films, while blue sharpens fine detail. Lock down lighting before tuning software, because no algorithm recovers contrast that the optics never captured.
What resolution and sensor type do I need for a given inspection?
Start from the smallest defect you must reliably detect, then allow 3 to 5 pixels across it. Sensor pixels needed equal the field of view divided by the desired spatial resolution. For example, a 100 mm field that must resolve a 0.1 mm feature with 3 pixels needs roughly 3,000 pixels across, so a 5 to 12 MP camera. Choose a global-shutter sensor for any moving part, because a rolling shutter exposes lines sequentially and skews fast-moving objects; global shutter exposes all pixels at once. Use monochrome unless color is the inspected feature, because mono sensors have no Bayer color filter and so deliver higher effective resolution and light sensitivity. Modern CMOS sensors such as the Sony Pregius global-shutter family have largely displaced CCD in machine vision on speed, noise, and cost.
Do I need a 3D vision system or is 2D enough?
Use 2D when the inspected feature lives in a plane: presence/absence, surface defects, print and label verification, barcode and OCR reading, and gauging of flat parts. Use 3D when height, volume, shape, or robust pose are the measurement: bead and weld inspection, planarity and warpage, bin picking, and volume estimation. Common 3D methods are laser triangulation (a laser profiler sweeps a line and reconstructs a height map), structured light, stereo vision, and time of flight, each trading resolution against speed and standoff. 3D resists lighting and contrast variation that defeat 2D, but it costs more and runs slower, so many lines combine a fast 2D station for cosmetic checks with a 3D station for dimensional ones.
When is deep-learning vision worth it over rule-based machine vision?
Rule-based (classical) machine vision excels at deterministic tasks with stable appearance: precise gauging, alignment, barcode and matrix reading, and measuring known geometry. It is fast, explainable, and needs no training images. Deep learning earns its place where defects are variable, textured, or hard to define by rule: cosmetic scratches on grained surfaces, organic and amorphous shapes, mixed-appearance assemblies, and deformed-character OCR. Tools such as Cognex VisionPro Deep Learning and MVTec HALCON deep-learning operators classify, segment anomalies, locate features, and read text from a few hundred to a few thousand labeled images. Many real lines are hybrid: classical tools fixture and gauge, a deep-learning tool judges the cosmetic defect that no fixed rule can encode. Budget for labeled-image collection and retraining as the production mix changes.